-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Issues with Nest Connection Pool retry mechanism #1080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just trying to understand the code. Is this really a problem with updating the NodeList during ping/sniff? |
Hi @satishmallik been thinking about this all week there are two ways we can solve this. The retry logic is contained in Transport and the RequestHandlers which coordinate the requests over IConnection's, they do not know what exception from the IConnection is a timeout exception (these are different for i.e thrift and webrequests). We can solve it globbally by introducing a Or extend the My personal preference is option 1 although i'm struggling to find sane defaults. We could pair |
…luster Conflicts: src/Elasticsearch.Net/Connection/Configuration/IConnectionConfigurationValues.cs src/Elasticsearch.Net/Connection/RequestHandlers/RequestHandlerBase.cs src/Tests/Elasticsearch.Net.Tests.Unit/Elasticsearch.Net.Tests.Unit.csproj
…luster Conflicts: src/Elasticsearch.Net/Connection/Configuration/IConnectionConfigurationValues.cs src/Elasticsearch.Net/Connection/RequestHandlers/RequestHandlerBase.cs src/Tests/Elasticsearch.Net.Tests.Unit/Elasticsearch.Net.Tests.Unit.csproj
Currently nest is not able to differentiate between Service Unavailable and Timeout exception.
Problem description
A typical response where this situation arises look like,
Currently nest retires all retryable exception like timeouts on all nodes of the connection pool.
So say a wildcard query is sent to query node q1. If it timesout the query should timeout.
But in this scenario nest still sends this query to another node q2. It again timesout and in turn it is sent to q3. So query times out after 3 mins instead of 1 min.
This also makes a case of DOS attack.
Proper fix should be to differentiate between timeout and ServiceNotAvailable exception. Query should be sent on another node from connection pool only if service is not available on first node.
The text was updated successfully, but these errors were encountered: