Failover broken when hard exceptions occured when using *Async methods #803

Mpdreamz · 2014-07-17T08:52:36Z

When IConnection throws a hard exception in any of the *Async methods before actual async work has been done the async routines in Transport.cs do not properly failover as they only failover properly for tasks in faulted state.

In most cases the exception happens in a way that causes a faulted task to be returned (when reading responses asynchronously) but some of the exception happen earlier with DNS lookups (see #802) these are always synchronously and will throw a hard exception.

This PR also fixes the case where a hard exception during pinging/sniffing using the Async methods could cause a connection not to failover.

The synchronous versions all behave correctly during hard exceptions.

…(not only faulted tasks, synchronous version was ok but async version did not failover properly

…IConnection

…arly as suppose to inner maxretry exceptions. Added integration tests for the exception (which behave different in practice then theory (unit tests

… nodeif all the nodes are dead over picking the one thats dead longest would work better in practice

… Works in unit tests not in integration tests..

…uest) causing ElasticsearchServerException to throw a nullreference and .ServerError to always be null

Mpdreamz · 2014-07-18T10:40:38Z

Updated the PR to also have better MaxRetryException messages:

I.e when you issues a request but all the sniffs on all the nodes failed you now see this instead of nested MaxRetryExceptions

Elasticsearch.Net.Exceptions.MaxRetryException : Sniffing known nodes in the cluster caused a maxretry exception of its own
  ----> Elasticsearch.Net.Exceptions.SniffException : Sniffing known nodes in the cluster caused a maxretry exception of its own
  ----> Elasticsearch.Net.Exceptions.MaxRetryException : Failed after retrying 2 times: 'GET _nodes/_all/clear?timeout=50'. 
InnerException: PingException, InnerMessage: Pinging http://ipv4.fiddler:9201/ caused an exception, InnerStackTrace:    at Elasticsearch.Net.Connection.Transport.Ping(ITransportRequestState requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 91
   at Elasticsearch.Net.Connection.Transport.DoRequest[T](TransportRequestState`1 requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 326
InnerException: PingException, InnerMessage: Pinging http://ipv4.fiddler:9200/

Similary when all the pings fail you can clearly see which nodes were pinged and failed:

Elasticsearch.Net.Exceptions.MaxRetryException : Failed after retrying 2 times: 'GET '. 
InnerException: PingException, InnerMessage: Pinging http://ipv4.fiddler:9202/ caused an exception, InnerStackTrace:    at Elasticsearch.Net.Connection.Transport.Ping(ITransportRequestState requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 91
   at Elasticsearch.Net.Connection.Transport.DoRequest[T](TransportRequestState`1 requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 326
InnerException: PingException, InnerMessage: Pinging http://ipv4.fiddler:9201/ caused an exception, InnerStackTrace:    at Elasticsearch.Net.Connection.Transport.Ping(ITransportRequestState requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 91
   at Elasticsearch.Net.Connection.Transport.DoRequest[T](TransportRequestState`1 requestState) in C:\Projects\NEST\src\Elasticsearch.Net\Connection\Transport.cs:line 326
InnerException: PingException, InnerMessage: Pinging http://ipv4.fiddler:9200/ caused an exception, InnerStackTrace:    at Elasticsearch.Net.Connection.Transport.Ping(ITransportRequestState requestState) in

This PR also includes unit and integration tests to make sure exceptions bubble out the client the same for synchronous and asynchronous calls. We had unit tests for all of these but in practice things behave differently for instance HttpWebRequest by default only reads 65k of error response streams, our new integration caught this and this is now also fixed as off 856ad81.

…ect Mono

Mpdreamz · 2014-07-18T12:06:48Z

Tested this branch under mono too Mono does not support HttpWebRequest.DefaultMaximumErrorResponseLength so we are not calling it if we detect we are running under Mono.

…tion-failover Failover broken when hard exceptions occured when using *Async methods

Mpdreamz added 3 commits July 17, 2014 09:58

#802 added additional checks if hard exceptions occur in IConnection …

736160d

…(not only faulted tasks, synchronous version was ok but async version did not failover properly

added unit tests where the sniff endpoint throws a hard exception in …

12f90f7

…IConnection

added tests for hard exceptions in IConnection async methods

1851b1b

Mpdreamz mentioned this pull request Jul 17, 2014

Using connection pools throws SocketException #802

Closed

Mpdreamz added 6 commits July 17, 2014 23:52

Improved exception messages to include ping/sniff exceptions more cle…

e3784cc

…arly as suppose to inner maxretry exceptions. Added integration tests for the exception (which behave different in practice then theory (unit tests

fixed failing unit test in because I tried to see if picking a random…

93c9f21

… nodeif all the nodes are dead over picking the one thats dead longest would work better in practice

fixed async exceptions bubling appropiately

a2127de

Wrote more unit tests that check wheter .ServerError is properly set.…

502975b

… Works in unit tests not in integration tests..

fixed error responses that exceeded 65kb being truncated (sigh WebReq…

856ad81

…uest) causing ElasticsearchServerException to throw a nullreference and .ServerError to always be null

Show stacktraces in MaxRetryException message

2dae577

Mpdreamz added 3 commits July 18, 2014 12:51

fixed failing unit tests

716f28d

ignored startup tests for now

ff57f4e

fixed mono builds and made sure the HttpConnection changes do not aff…

ee178c7

…ect Mono

Mpdreamz and others added 2 commits July 18, 2014 14:45

updated test that succeeded because fiddler was running

e2c8ad4

Fix casing

3309955

gmarz added a commit that referenced this pull request Jul 18, 2014

Merge pull request #803 from elasticsearch/fix/hard-exceptions-connec…

a92aac3

…tion-failover Failover broken when hard exceptions occured when using *Async methods

gmarz merged commit a92aac3 into develop Jul 18, 2014

gmarz deleted the fix/hard-exceptions-connection-failover branch July 18, 2014 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failover broken when hard exceptions occured when using *Async methods #803

Failover broken when hard exceptions occured when using *Async methods #803

Mpdreamz commented Jul 17, 2014

Mpdreamz commented Jul 18, 2014

Mpdreamz commented Jul 18, 2014

Failover broken when hard exceptions occured when using *Async methods #803

Failover broken when hard exceptions occured when using *Async methods #803

Conversation

Mpdreamz commented Jul 17, 2014

Mpdreamz commented Jul 18, 2014

Mpdreamz commented Jul 18, 2014