Skip to content

Exception thrown trying to await requests tasks after multiple SearchAsync are called #8268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alessandro-gandolfi-ors opened this issue Jul 18, 2024 · 17 comments · Fixed by #8296
Labels
8.x Relates to a 8.x client version Area: Client Category: Bug

Comments

@alessandro-gandolfi-ors
Copy link

alessandro-gandolfi-ors commented Jul 18, 2024

Elastic.Clients.Elasticsearch version: 8.14.5

Elasticsearch version: 8.7.1

.NET runtime version: .NET 6

Operating system version: Windows 11

Description of the problem including expected versus actual behavior:
Calling multiple times SearchAsync and awaiting the responses after results in "Object reference not set to an instance of an object."

Steps to reproduce:

  1. Initialize a SearchRequest
  2. Call multiple times SearchAsync with the same SearchRequest and do NOT await the response, get the tasks instead
  3. Await the tasks execution after

Expected behavior
The tasks should be awaited correctly, returning the results of the searches.

Provide DebugInformation (if relevant):
This wasn't an issue with NEST and ES7, it was reproduced after trying to upgrade to Elastic.Clients.Elasticsearch and ES8.
It was reproduced initially by running some tests that worked before the updates, that called some SearchAsync and CountAsync consecutively. The exception was sometimes thrown by the CountAsync too, so it's not limited to the SearchAsync method.

The next screenshot is a minimal test that reproduces the behaviour reported. The exception is thrown at the second or third awaited task.
image

@alessandro-gandolfi-ors alessandro-gandolfi-ors added 8.x Relates to a 8.x client version Category: Bug labels Jul 18, 2024
@flobernd
Copy link
Member

Hi @alessandro-gandolfi-ors,

I tried to reproduce this issue, but was not able to. This is the code I've used:

var q = new SearchRequest("person")
{
	Query = Query.MatchAll(new MatchAllQuery())
};

var t1 = client.SearchAsync<JsonObject>(q);
var t2 = client.SearchAsync<JsonObject>(q);
var t3 = client.SearchAsync<JsonObject>(q);

await Task.WhenAny(t1, t2, t3);

It would really help, if you could please post the complete call-stack of the NullReferenceException.

@alessandro-gandolfi-ors
Copy link
Author

@flobernd I have censured some parts

This is the stacktrace:

System.NullReferenceException
  HResult=0x80004003
  Message=Object reference not set to an instance of an object.
  Source=Elastic.Clients.Elasticsearch
  StackTrace:
   at Elastic.Clients.Elasticsearch.ElasticsearchClient.<>c__DisplayClass28_0`3.<<DoRequestCoreAsync>g__SendRequest|0>d.MoveNext()
   at ***.Dao.DiagnosticService.<DiagnosticAsyncSearchRequests>d__3.MoveNext() in C:\Projects\***\Dao\DiagnosticService.cs:line 55

This is the callstack:
image

This is the callstack with external code enabled:
image

@flobernd
Copy link
Member

Hi @alessandro-gandolfi-ors, could you please post a complete standalone reproducer project/solution for me that creates and index, seeds some data and afterwards runs the failing tests? I'm still not able to reproduce the problem.

@flobernd
Copy link
Member

One thing to test before that maybe: What happens if you fire and await one initial request (e.g. InfoAsync()) using the same ElasticsearchClient instance, before executing the concurrent requests? If this works reliably, that would give me a strong hint in the right direction, I think.

Copy link
Contributor

github-actions bot commented Aug 6, 2024

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 2 days.

@alessandro-gandolfi-ors
Copy link
Author

@flobernd sorry for not replying but right now I'm working on other projects, I will make the example project as soon as I begin again to work on the one regarding this package.

The initial problem was encountered by firing a search request that was not awaited and right after one that was awaited. The second one threw the exception:
image

@flobernd
Copy link
Member

flobernd commented Aug 7, 2024

@alessandro-gandolfi-ors No worries about the delay. Before putting effort in creating such test/example project, could you please test this:

What happens if you fire and await one initial request (e.g. InfoAsync()) using the same ElasticsearchClient instance, before executing the concurrent requests?

So basically the other way around. The result of this test would be highly interesting.

@alessandro-gandolfi-ors
Copy link
Author

@flobernd I tried to create a test as you said and even when firing more requests after the first await, the test always succeded and I never encountered the exception.
image

The first test without the first await continues to throw the exception most of the times I run it.

@flobernd
Copy link
Member

Hi @alessandro-gandolfi-ors, thank you for testing this! That confirms my assumption. I'll make sure to harden the specific part of the code that is responsible for causing the race condition.

@alessandro-gandolfi-ors
Copy link
Author

Thank you @flobernd for the support, if you have any other test in mind that I could do I'd be glad to help

@flobernd
Copy link
Member

Should be fixed in 8.15.1, thanks for reporting @alessandro-gandolfi-ors! Please let me know, if the fix has fully resolved the issue.

@alessandro-gandolfi-ors
Copy link
Author

alessandro-gandolfi-ors commented Aug 21, 2024

@flobernd I tried updating to 8.15.2 but there are still issues when running tests. If I run them individually they succeed but if I try to run the whole class that has a bit less then 30 tests, almost half of them throw some sort of exception. I'm attaching some exceptions that were thrown by the tests to this comment. The exceptions are only thrown when I await the tasks after I create them, they do not happen if I await them at creation (on the same line).
log1.txt
log2.txt
log3.txt

@flobernd
Copy link
Member

Hi @alessandro-gandolfi-ors,

this is all speculative at this point, but is that maybe caused by some kind of network misconfiguration? The log1 and log2 contain a nginx 404 response:

    <html>
    <head><title>404 Not Found</title></head>
    <body>
    <center><h1>404 Not Found</h1></center>
    <hr><center>nginx</center>
    </body>
    </html>

As far as I know, there is no nginx involved in a standard Elasticsearch installation. Do you use some kind of proxy?

The log3 looks like some kind of network error as well:

 System.Net.Http.HttpIOException: The response ended prematurely. (ResponseEnded)

This can happen if the server (or a proxy) prematurely closes the connection. Do you maybe run into rate limits?

@alessandro-gandolfi-ors
Copy link
Author

@flobernd yes, the instance I'm using to run the tests is exposed through an nginx ingress, it is the same instance I use to run the tests both if I await the tasks on the same line of code (which works) or if I await them afterwards (which doesn't work). I checked the nginx ingress controller logs that manages this instance's ingress and found that all the requests have returned with http status code 200.

I'm running the same tests on a version that is still using nest/elasticsearch.net and es7 behind another nginx ingress with the tasks awaited afterwards and that one doesn't have these issues. These tests are run to be sure there are not regressions in the new version. I can't check the logs of this ingress controller but will do asap and update with another comment.

tl;dr:

  • new package, es8 and await on creation: works
  • new package, es8 and await afterwards: doesn't work
    • only when launching the tests together, if they are launched indipendentely they work
  • old packages, es7 and await on creation: works
  • old packages, es7 and await afterwards (original version of the project): works

@flobernd
Copy link
Member

Hi @alessandro-gandolfi-ors,

found that all the requests have returned with http status code 200.

How is that possible when the logs clearly say 404 and even include the nginx standard 404 output in the body?

A few more questions:

Does the index for e.g. the log1 mapping request exist?

  • new package, es8 and await afterwards: doesn't work
    • only when launching the tests together, if they are launched indipendentely they work

Could you please share some details about the tests? Is the same ElasticsearchClient instance re-used in all the test-cases of that test class? Are there any other objects (e.g. requests, etc.) that are shared-use between multiple test-cases?


I'm quite confused by the effects of this issue. What you describe in your tl;dr definitely looks like another concurrency issue. On the other hand, I have absolutely no idea how a concurrency issue would case the nginx 404 responses.

I think to find a solution more effectively, we have to somehow enable me to reproduce this problem on my machine. Would it be possible to provide a full reproducer example? If network aspects are involved, we might additional have to come up with some Docker solution or even set up a similar (cloud?) environment to the one you are using.

@alessandro-gandolfi-ors
Copy link
Author

Hi @flobernd, sorry for the trouble, it was a false alarm.

After searching more, I discovered that the proxy of the company I work for uses an additional ingress that is used only for the proxy itself. Bypassing that proxy and pointing directly to the nginx ingress controller exposing the es8 ingress (the logs I looked at were of this controller but the requests didn't arrive at it at all) makes the tests work.

As I said previously, the es7 instance was behind another proxy and cluster, so if the configuration of the proxy is different, that's why the tests worked with nest/elasticsearch.net on the previous version of the project.

Thanks again for the support!

@flobernd
Copy link
Member

Hi @alessandro-gandolfi-ors, thanks for updating the issue 🙂 I was already scratching my head, thinking about potential problems that could cause these strange side-effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.x Relates to a 8.x client version Area: Client Category: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants