[Feature Request] Configuration to customize discovery/zen/fd/master_ping #36822

kimxogus · 2018-12-19T10:15:24Z

Describe the feature:

Configuration to customize discovery/zen/fd/master_ping. A config option to make elasticsearch skip pinging and waiting for old master before new master.

In kubernetes environment, ip of each member node in cluster are assigned to a pod which is a docker container. When a pod(node) is terminated. you will have a ping timeout to old master address as newly created pod(node) will have a different ip address. In this situation, cluster outage occurs for `discovery.zen.join_timeout` * 20 times(as [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election)) which will be more than a minute. Reducing `ping_timeout` lower than 1 second is too dangerous(may have a problem in master-election) and waiting for several seconds after SIGTERM to elasticsearch for maintaining pod ip for ping doesn't seem to be a proper solution. As [this discussion](https://discuss.elastic.co/t/timed-out-waiting-for-all-nodes-to-process-published-state-and-cluster-unavailability/138590), I believe that adding a config option to make elasticsearch skip pinging and waiting for old master before new master will be a good solution.

reference: [stable/elasticsearch] Terminating current master pod causes cluster outage of more than 30 seconds helm/charts#8785 , https://discuss.elastic.co/t/timed-out-waiting-for-all-nodes-to-process-published-state-and-cluster-unavailability/138590

Elasticsearch version (bin/elasticsearch --version): 6.2.3

Plugins installed: [ingest-geoip, ingest-user-agent, repository-s3]

JVM version (java -version):

openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

OS version (uname -a if on a Unix-like system): Linux {HOSTNAME} 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

Deploy elasticsearch cluster in kubernetes (helm chart in my case)
Terminate current master pod(node)
New master is elected within 3~5 seconds, but any member node in cluster doesn't
respond to http requests about 1 minute(with discovery.zen.ping_timeout=3s and discovery.zen.fd.ping_timeout=3s).

Provide logs (if relevant):

[2018-12-19T09:12:33,326][INFO ][o.e.c.s.ClusterApplierService] [es-monitoring-elasticsearch-master-0] detected_master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}, added {{es-monitoring-elasticsearch-client-57654b8f98-p47cm}{HJlePFqgQxq_wmFDEDNQEw}{Thx48_UDSL2CwzLrg0NL2w}{100.96.161.172}{100.96.161.172:9300},{es-monitoring-elasticsearch-master-2}{v3FjSTfcQ4OHAzXCzDcKFQ}{e1X2hVV8SIOkDk1wvE3LKw}{100.96.162.240}{100.96.162.240:9300},{es-monitoring-elasticsearch-data-2}{V2meIqpNTQOH8zY4PCtQ7g}{Pr9uoG03Qc6Xx2h4x-o62A}{100.96.162.225}{100.96.162.225:9300},{es-monitoring-elasticsearch-data-1}{mqfXo0yqTaCcEc956tVmpA}{NQehgvsvQq2Kh1K6tKZaxA}{100.96.161.175}{100.96.161.175:9300},{es-monitoring-elasticsearch-data-0}{rn-v-yB8RbeoHXovkC4UYQ}{vF1s-vC7TheNqloIyEJg4A}{100.96.165.88}{100.96.165.88:9300},{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300},{es-monitoring-elasticsearch-client-57654b8f98-dgvxm}{462pBrdyScC9WgmlkJr8ug}{vv_jrSxbTHi3wo-r03k0fQ}{100.96.166.205}{100.96.166.205:9300},}, reason: apply cluster state (from master [master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300} committed version [367]])
[2018-12-19T09:12:43,331][INFO ][o.e.d.z.ZenDiscovery     ] [es-monitoring-elasticsearch-master-0] master_left [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], reason [failed to ping, tried [3] times, each with  maximum [3s] timeout]
[2018-12-19T09:12:43,332][WARN ][o.e.d.z.ZenDiscovery     ] [es-monitoring-elasticsearch-master-0] master left (reason = failed to ping, tried [3] times, each with  maximum [3s] timeout), current nodes: nodes:
   {es-monitoring-elasticsearch-client-57654b8f98-p47cm}{HJlePFqgQxq_wmFDEDNQEw}{Thx48_UDSL2CwzLrg0NL2w}{100.96.161.172}{100.96.161.172:9300}
   {es-monitoring-elasticsearch-master-2}{v3FjSTfcQ4OHAzXCzDcKFQ}{e1X2hVV8SIOkDk1wvE3LKw}{100.96.162.240}{100.96.162.240:9300}
   {es-monitoring-elasticsearch-data-2}{V2meIqpNTQOH8zY4PCtQ7g}{Pr9uoG03Qc6Xx2h4x-o62A}{100.96.162.225}{100.96.162.225:9300}
   {es-monitoring-elasticsearch-master-0}{K6kMktL9QJC2sc7K-35McA}{srwO3u3SS9GYAWYeLyUn-g}{100.96.165.141}{100.96.165.141:9300}, local
   {es-monitoring-elasticsearch-data-1}{mqfXo0yqTaCcEc956tVmpA}{NQehgvsvQq2Kh1K6tKZaxA}{100.96.161.175}{100.96.161.175:9300}
   {es-monitoring-elasticsearch-data-0}{rn-v-yB8RbeoHXovkC4UYQ}{vF1s-vC7TheNqloIyEJg4A}{100.96.165.88}{100.96.165.88:9300}
   {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}, master
   {es-monitoring-elasticsearch-client-57654b8f98-dgvxm}{462pBrdyScC9WgmlkJr8ug}{vv_jrSxbTHi3wo-r03k0fQ}{100.96.166.205}{100.96.166.205:9300}

[2018-12-19T09:12:57,612][INFO ][o.e.d.z.ZenDiscovery     ] [es-monitoring-elasticsearch-master-0] failed to send join request to master [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2018-12-19T09:13:01,851][INFO ][o.e.c.s.ClusterApplierService] [es-monitoring-elasticsearch-master-0] detected_master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}, reason: apply cluster state (from master [master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300} committed version [368]])
[2018-12-19T09:13:01,857][WARN ][o.e.t.TransportService   ] [es-monitoring-elasticsearch-master-0] Received response for a request that has timed out, sent [27532ms] ago, timed out [24531ms] ago, action [internal:discovery/zen/fd/master_ping], node [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], id [29]
[2018-12-19T09:13:01,857][WARN ][o.e.t.TransportService   ] [es-monitoring-elasticsearch-master-0] Received response for a request that has timed out, sent [24529ms] ago, timed out [21529ms] ago, action [internal:discovery/zen/fd/master_ping], node [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], id [30]
[2018-12-19T09:13:01,858][WARN ][o.e.t.TransportService   ] [es-monitoring-elasticsearch-master-0] Received response for a request that has timed out, sent [21530ms] ago, timed out [18530ms] ago, action [internal:discovery/zen/fd/master_ping], node [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], id [31]
[2018-12-19T09:15:54,284][INFO ][o.e.c.s.ClusterApplierService] [es-monitoring-elasticsearch-master-0] removed {{es-monitoring-elasticsearch-client-57654b8f98-dgvxm}{462pBrdyScC9WgmlkJr8ug}{vv_jrSxbTHi3wo-r03k0fQ}{100.96.166.205}{100.96.166.205:9300},}, reason: apply cluster state (from master [master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300} committed version [369]])

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-12-19T10:44:29Z

Pinging @elastic/es-distributed

dliappis · 2018-12-19T10:48:24Z

@DaveCTurner not sure if it makes sense to consider a zen proposal such as this, given zen2 progress.

dliappis · 2018-12-19T10:57:30Z

@kimxogus Might be worth taking a look at Elastic's own helm chart -- currently in alpha status -- for Elasticsearch and esp. the clustering and node discovery approach.

DaveCTurner · 2018-12-19T11:26:44Z

We certainly won't fix this as described - the fault detection and master election mechanisms are completely changing for 7.0 as described in #32006 - but I do think we can do better in this situation. Marking this for team discussion.

The proposal doesn't actually fix the problem described anyway, because it's not a pinging problem:

New master is elected within 3~5 seconds, but any member node in cluster doesn't
respond to http requests about 1 minute(with discovery.zen.ping_timeout=3s and discovery.zen.fd.ping_timeout=3s).

I think the actual problem here is #29025, but a more orderly master handover process would also help.

DaveCTurner · 2018-12-19T11:40:51Z

On Linux, reducing net.ipv4.tcp_retries2 (the sysctl, i.e. /proc/sys/net/ipv4/tcp_retries2, not an Elasticsearch setting) ought to help here too. See #34405 (comment).

kimxogus · 2018-12-20T01:12:16Z

Reducing net.ipv4.tcp_retries2 to 3 didn't help mine in both https://github.com/helm/charts/tree/master/stable/elasticsearch and elastic's own chart.
I made sure net.ipv4.tcp_retries2 = 3 and discovery.zen.ping_timeout = 3s, but outage was about 1m.

kimxogus · 2018-12-20T01:13:51Z

and internal:discovery/zen/fd/master_ping is stil taking longer than 120000ms.

log in elastic's own chart with net.ipv4.tcp_retries2 = 3 and discovery.zen.ping_timeout = 3s

[2018-12-20T01:10:54,073][WARN ][o.e.t.TransportService   ] [elasticsearch-master-0] Received response for a request that has timed out, sent [148120ms] ago, timed out [145119ms] ago, action [internal:discovery/zen/fd/master_ping], node [{elasticsearch-master-1}{tmWFtydtQBiU54yOoiN0Jw}{pr09c9sPQtGOEZ2AtdUaFQ}{100.96.166.76}{100.96.166.76:9300}{ml.machine_memory=805306368, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [77]
[2018-12-20T01:10:54,074][WARN ][o.e.t.TransportService   ] [elasticsearch-master-0] Received response for a request that has timed out, sent [145119ms] ago, timed out [142117ms] ago, action [internal:discovery/zen/fd/master_ping], node [{elasticsearch-master-1}{tmWFtydtQBiU54yOoiN0Jw}{pr09c9sPQtGOEZ2AtdUaFQ}{100.96.166.76}{100.96.166.76:9300}{ml.machine_memory=805306368, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [78]
[2018-12-20T01:10:54,074][WARN ][o.e.t.TransportService   ] [elasticsearch-master-0] Received response for a request that has timed out, sent [142117ms] ago, timed out [139115ms] ago, action [internal:discovery/zen/fd/master_ping], node [{elasticsearch-master-1}{tmWFtydtQBiU54yOoiN0Jw}{pr09c9sPQtGOEZ2AtdUaFQ}{100.96.166.76}{100.96.166.76:9300}{ml.machine_memory=805306368, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [79]

DaveCTurner · 2018-12-20T09:31:53Z

I do not understand what these messages have to do with the original post, or how you managed to get them. The OP was talking about shutting down a master, but if the master were shut down then it'd never respond, so that's not how these messages arose. Also these requests timed out after 3 seconds, and Elasticsearch reacted to the timeout at that time.

DaveCTurner · 2018-12-20T09:35:21Z

Could you share logs from both the old, stopping, master and the newly-elected master for the time period from when the old master stopped until the new master was elected and the cluster has fully recovered?

kimxogus · 2018-12-21T07:57:49Z

@DaveCTurner
This is my test chart based on https://github.com/elastic/helm-charts.

I created test master cluster with helm install ./elasticsearch --name es-test

and logs with logger.level=debug config option.

old master(master-0) took SIGTERM about 2018-12-21T07:46:16,521. and outage was about 1 minute.

kimxogus · 2018-12-21T08:03:47Z

vm.max_map_count=262144 and net.ipv4.tcp_retries2=3 in sysctl by k8s pod init containers.
logger.level=debug, discovery.zen.ping_timeout=3s, discovery.zen.fd.ping_timeout=3s by environment variable setting.

Other settings are default values in original chart and image is official image.

DaveCTurner · 2018-12-21T10:41:20Z

Thanks, the logs were helpful. The issue you are facing is related to #29025: the first cluster state update from the new master causes all the nodes to try and re-establish their connections to the old master, expecting this either to succeed or fail immediately. However Docker's network doesn't behave as expected: if the container has completely gone away, connection attempts receive no response and eventually time out. Worse, we try twice before continuing, so it takes two connection timeouts (each 30 seconds by default) before the cluster proceeds.

I would reset your ping_timeout since it's actually making things a bit worse here, and instead consider reducing transport.tcp.connect_timeout until #29025 is resolved.

DaveCTurner · 2018-12-21T10:43:12Z

Duplicates #29025.

kimxogus · 2018-12-21T11:17:32Z

Thank you 👍
reducing transport.tcp.connect_timeout to 2 ~ 3 seconds made outage around 8 ~ 10 seconds.

kimxogus changed the title ~~Configuration to customize discovery/zen/fd/master_ping~~ [Feature Request] Configuration to customize discovery/zen/fd/master_ping Dec 19, 2018

kimxogus mentioned this issue Dec 19, 2018

[stable/elasticsearch] Terminating current master pod causes cluster outage of more than 30 seconds helm/charts#8785

Closed

dliappis added the :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. label Dec 19, 2018

dliappis added the >enhancement label Dec 19, 2018

DaveCTurner added the team-discuss label Dec 19, 2018

DaveCTurner closed this as completed Dec 21, 2018

kimxogus mentioned this issue Jan 17, 2019

[stable/elasticsearch] fix cluster outage during master termination helm/charts#10687

Closed

3 tasks

kimxogus mentioned this issue Jan 29, 2019

fix cluster outage, add masterService template elastic/helm-charts#41

Closed

andreykaipov mentioned this issue Feb 18, 2019

Slow re-election when elected master pod is deleted elastic/helm-charts#63

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Configuration to customize discovery/zen/fd/master_ping #36822

[Feature Request] Configuration to customize discovery/zen/fd/master_ping #36822

kimxogus commented Dec 19, 2018

elasticmachine commented Dec 19, 2018

Uh oh!

dliappis commented Dec 19, 2018

Uh oh!

dliappis commented Dec 19, 2018

Uh oh!

DaveCTurner commented Dec 19, 2018

Uh oh!

DaveCTurner commented Dec 19, 2018 •

edited

Loading

Uh oh!

kimxogus commented Dec 20, 2018

Uh oh!

kimxogus commented Dec 20, 2018

Uh oh!

DaveCTurner commented Dec 20, 2018

Uh oh!

DaveCTurner commented Dec 20, 2018

Uh oh!

kimxogus commented Dec 21, 2018

Uh oh!

kimxogus commented Dec 21, 2018 •

edited

Loading

Uh oh!

DaveCTurner commented Dec 21, 2018

Uh oh!

DaveCTurner commented Dec 21, 2018

Uh oh!

kimxogus commented Dec 21, 2018 •

edited

Loading

Uh oh!

[Feature Request] Configuration to customize discovery/zen/fd/master_ping #36822

[Feature Request] Configuration to customize discovery/zen/fd/master_ping #36822

Comments

kimxogus commented Dec 19, 2018

elasticmachine commented Dec 19, 2018

Uh oh!

dliappis commented Dec 19, 2018

Uh oh!

dliappis commented Dec 19, 2018

Uh oh!

DaveCTurner commented Dec 19, 2018

Uh oh!

DaveCTurner commented Dec 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kimxogus commented Dec 20, 2018

Uh oh!

kimxogus commented Dec 20, 2018

Uh oh!

DaveCTurner commented Dec 20, 2018

Uh oh!

DaveCTurner commented Dec 20, 2018

Uh oh!

kimxogus commented Dec 21, 2018

Uh oh!

kimxogus commented Dec 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DaveCTurner commented Dec 21, 2018

Uh oh!

DaveCTurner commented Dec 21, 2018

Uh oh!

kimxogus commented Dec 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DaveCTurner commented Dec 19, 2018 •

edited

Loading

kimxogus commented Dec 21, 2018 •

edited

Loading

kimxogus commented Dec 21, 2018 •

edited

Loading