-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Retry follow task when remote connection queue full #55314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-distributed (:Distributed/CCR) |
@@ -105,10 +105,14 @@ public int getNumberOfChannels() { | |||
Setting.Property.NodeScope, | |||
Setting.Property.Dynamic)); | |||
|
|||
// this setting is intentionally not registered, it is only used in tests | |||
public static final Setting<Integer> REMOTE_MAX_CONNECTION_QUEUE_SIZE = | |||
Setting.intSetting("cluster.remote.max_connection_queue_size", 100, Setting.Property.NodeScope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there was a lot of thought to the connection listener limit. If there is a strong reason to increase it past 100 we could probably do that. Also does does this name make sense? We only allow a single connection round at a time. Should the name be cluster.remote.max_pending_connection_listeners
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the name be cluster.remote.max_pending_connection_listeners?
++. I renamed it in f9c807f.
I don't think there was a lot of thought to the connection listener limit. If there is a strong reason to increase it past 100 we could probably do that.
Yeah, I think we chose this value quite arbitrarily. I think it's fine to increase this value as we should not have many concurrent remote searches, and CCR will retry on this error anyway. I've increased this to 1000. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@tbrooks8 Thanks for reviewing. |
If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException.
If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException.
If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException.
If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException. Backport of #55314
If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException.