Retry follow task when remote connection queue full #55314

dnhatn · 2020-04-16T14:48:05Z

If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException.

elasticmachine · 2020-04-16T14:48:08Z

Pinging @elastic/es-distributed (:Distributed/CCR)

Tim-Brooks · 2020-04-16T16:59:31Z

server/src/main/java/org/elasticsearch/transport/RemoteConnectionStrategy.java

@@ -105,10 +105,14 @@ public int getNumberOfChannels() {
            Setting.Property.NodeScope,
            Setting.Property.Dynamic));

+    // this setting is intentionally not registered, it is only used in tests
+    public static final Setting<Integer> REMOTE_MAX_CONNECTION_QUEUE_SIZE =
+        Setting.intSetting("cluster.remote.max_connection_queue_size", 100, Setting.Property.NodeScope);


I don't think there was a lot of thought to the connection listener limit. If there is a strong reason to increase it past 100 we could probably do that. Also does does this name make sense? We only allow a single connection round at a time. Should the name be cluster.remote.max_pending_connection_listeners?

Should the name be cluster.remote.max_pending_connection_listeners?

++. I renamed it in f9c807f.

I don't think there was a lot of thought to the connection listener limit. If there is a strong reason to increase it past 100 we could probably do that.

Yeah, I think we chose this value quite arbitrarily. I think it's fine to increase this value as we should not have many concurrent remote searches, and CCR will retry on this error anyway. I've increased this to 1000. WDYT?

Tim-Brooks

LGTM

dnhatn · 2020-04-17T04:10:44Z

@tbrooks8 Thanks for reviewing.

If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException.

If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException. Backport of #55314

Retry follow task when remote connection queue full

8def922

dnhatn added >bug :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v8.0.0 v7.6.3 v6.8.9 v7.8.0 v7.7.1 labels Apr 16, 2020

dnhatn requested review from martijnvg, Tim-Brooks and jasontedor April 16, 2020 14:48

Merge branch 'master' into remote-connect-queue

8e911a4

Tim-Brooks reviewed Apr 16, 2020

View reviewed changes

dnhatn added 2 commits April 16, 2020 13:23

rename to max_pending_connection_listeners

f9c807f

increase to 1k

6c3e803

dnhatn requested a review from Tim-Brooks April 16, 2020 17:30

Tim-Brooks approved these changes Apr 16, 2020

View reviewed changes

dnhatn merged commit 5216bd2 into elastic:master Apr 17, 2020

dnhatn deleted the remote-connect-queue branch April 17, 2020 04:10

dnhatn added the backport pending label Apr 17, 2020

dnhatn mentioned this pull request May 1, 2020

Retry follow task when remote connection queue full #56073

Merged

dnhatn removed the backport pending label May 2, 2020

jakelandis removed the v8.0.0 label Jul 26, 2021

jakelandis added the v8.0.0-alpha1 label Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry follow task when remote connection queue full #55314

Retry follow task when remote connection queue full #55314

dnhatn commented Apr 16, 2020

elasticmachine commented Apr 16, 2020

Tim-Brooks Apr 16, 2020

dnhatn Apr 16, 2020 •

edited

Loading

Tim-Brooks left a comment

dnhatn commented Apr 17, 2020

Retry follow task when remote connection queue full #55314

Retry follow task when remote connection queue full #55314

Conversation

dnhatn commented Apr 16, 2020

elasticmachine commented Apr 16, 2020

Tim-Brooks Apr 16, 2020

Choose a reason for hiding this comment

dnhatn Apr 16, 2020 • edited Loading

Choose a reason for hiding this comment

Tim-Brooks left a comment

Choose a reason for hiding this comment

dnhatn commented Apr 17, 2020

dnhatn Apr 16, 2020 •

edited

Loading