Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failover broken when hard exceptions occured when using *Async methods #803

Merged
merged 14 commits into from
Jul 18, 2014

Conversation

Mpdreamz
Copy link
Member

When IConnection throws a hard exception in any of the *Async methods before actual async work has been done the async routines in Transport.cs do not properly failover as they only failover properly for tasks in faulted state.

In most cases the exception happens in a way that causes a faulted task to be returned (when reading responses asynchronously) but some of the exception happen earlier with DNS lookups (see #802) these are always synchronously and will throw a hard exception.

This PR also fixes the case where a hard exception during pinging/sniffing using the Async methods could cause a connection not to failover.

The synchronous versions all behave correctly during hard exceptions.

Mpdreamz added 6 commits July 17, 2014 23:52
…arly as suppose to inner maxretry exceptions. Added integration tests for the exception (which behave different in practice then theory (unit tests
… nodeif all the nodes are dead over picking the one thats dead longest would work better in practice
… Works in unit tests not in integration tests..
…uest) causing ElasticsearchServerException to throw a nullreference and .ServerError to always be null
@Mpdreamz
Copy link
Member Author

Updated the PR to also have better MaxRetryException messages:

I.e when you issues a request but all the sniffs on all the nodes failed you now see this instead of nested MaxRetryExceptions

Elasticsearch.Net.Exceptions.MaxRetryException : Sniffing known nodes in the cluster caused a maxretry exception of its own
  ----> Elasticsearch.Net.Exceptions.SniffException : Sniffing known nodes in the cluster caused a maxretry exception of its own
  ----> Elasticsearch.Net.Exceptions.MaxRetryException : Failed after retrying 2 times: 'GET _nodes/_all/clear?timeout=50'. 
InnerException: PingException, InnerMessage: Pinging http://ipv4.fiddler:9201/ caused an exception, InnerStackTrace:    at Elasticsearch.Net.Connection.Transport.Ping(ITransportRequestState requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 91
   at Elasticsearch.Net.Connection.Transport.DoRequest[T](TransportRequestState`1 requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 326
InnerException: PingException, InnerMessage: Pinging http://ipv4.fiddler:9200/ 

Similary when all the pings fail you can clearly see which nodes were pinged and failed:

Elasticsearch.Net.Exceptions.MaxRetryException : Failed after retrying 2 times: 'GET '. 
InnerException: PingException, InnerMessage: Pinging http://ipv4.fiddler:9202/ caused an exception, InnerStackTrace:    at Elasticsearch.Net.Connection.Transport.Ping(ITransportRequestState requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 91
   at Elasticsearch.Net.Connection.Transport.DoRequest[T](TransportRequestState`1 requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 326
InnerException: PingException, InnerMessage: Pinging http://ipv4.fiddler:9201/ caused an exception, InnerStackTrace:    at Elasticsearch.Net.Connection.Transport.Ping(ITransportRequestState requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 91
   at Elasticsearch.Net.Connection.Transport.DoRequest[T](TransportRequestState`1 requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 326
InnerException: PingException, InnerMessage: Pinging http://ipv4.fiddler:9200/ caused an exception, InnerStackTrace:    at Elasticsearch.Net.Connection.Transport.Ping(ITransportRequestState requestState) in 

This PR also includes unit and integration tests to make sure exceptions bubble out the client the same for synchronous and asynchronous calls. We had unit tests for all of these but in practice things behave differently for instance HttpWebRequest by default only reads 65k of error response streams, our new integration caught this and this is now also fixed as off 856ad81.

@Mpdreamz
Copy link
Member Author

Tested this branch under mono too Mono does not support HttpWebRequest.DefaultMaximumErrorResponseLength so we are not calling it if we detect we are running under Mono.

gmarz added a commit that referenced this pull request Jul 18, 2014
…tion-failover

Failover broken when hard exceptions occured when using *Async  methods
@gmarz gmarz merged commit a92aac3 into develop Jul 18, 2014
@gmarz gmarz deleted the fix/hard-exceptions-connection-failover branch July 18, 2014 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants