Skip to content

Rest client: provide the ability to define a specific timeout for a request #21789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gsmet opened this issue Nov 24, 2016 · 20 comments
Closed
Labels
:Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch >enhancement help wanted adoptme Team:Data Management Meta label for data/management team

Comments

@gsmet
Copy link

gsmet commented Nov 24, 2016

Hi!

Describe the feature:

Currently, the new Rest client only allows to set a default RequestConfigCallback globally. It would be nice if we could set a RequestConfigCallback specifically for a request.

Typically, it can be very useful when you want to set a specific timeout for a long operation. It would be nice if this RequestConfigCallback could be provided with the information of the request (so that for instance you could set a different timeout for different types of requests).

--
Guillaume

@clintongormley clintongormley added :Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch >enhancement labels Nov 25, 2016
@javanna
Copy link
Member

javanna commented Nov 28, 2016

Hi @gsmet thanks for reporting this.
I considered adding per request timeout in the first place, and the main reason not to do it was that it would require yet another performRequest method variant that accepts request config. It would be easy to do introducing some generic request object or builder around it, but we tried to keep things simple in the low-level client, and minimize the abstraction layer we add on top of the apache http client library we use under the hood.

How big of an overhead would be for you to use a specific client instance for long running operations? Do you have an example of requests that would need that?

@gsmet
Copy link
Author

gsmet commented Nov 28, 2016

Hi @javanna,

For instance, it might be useful when you use the bulk update API. This API call might take significantly longer that what you would expect for a "normal" search query. Setting the global timeout higher is not a good solution IMHO.

In Hibernate Search (https://github.com/hibernate/hibernate-search - we are working on the Elasticsearch support for the next version), we also allow the user to set the timeout of a query. This is used to set the Lucene timeout in the Lucene backend and the Elasticsearch timeout in the Elasticsearch backend but, if we are using the Rest client, we need to be sure the HTTP timeout is higher than that.

That being said, I think you're right about the number of method arguments and maybe a DSL would be in order.

To be completely thorough, I tried to implement a backend for Jest (https://github.com/searchbox-io/Jest) based on the Elasticsearch rest client the other day and I found that the Jest client has a lot of useful options (the reaper for connections idle for too long being one but there are a few others which might be interesting).

I think taking a look to what is done there might give you some good ideas. That being said, they do not support setting the timeout per query atm either.

@javanna
Copy link
Member

javanna commented Nov 28, 2016

This is used to set the Lucene timeout in the Lucene backend and the Elasticsearch timeout in the Elasticsearch backend but, if we are using the Rest client, we need to be sure the HTTP timeout is higher than that.

I think you mean the elasticsearch timeout parameter in the search api (please correct me if I am wrong). That one bounds the search request to be executed within the specified time value, or to return only the hits accumulated upon expiration. It means "stop if you are not done after n seconds". Some other apis support a timeout parameter (apis that update the cluster state) which has though a very different meaning depending on the api (maybe bad naming).

The socket timeout is quite different though, it is the timeout when waiting for individual packets, it doesn't mean the request will never take more than n seconds overall. That said I am not sure these two timeout mechanisms should be related to each other.

There may still be use-cases for increasing the socket timeout per request though, we just need to find a good way to do it in the low-level client. Also important if we want to expose it to the high-level client that is in the works.

@javanna
Copy link
Member

javanna commented Dec 2, 2016

We discussed this in our FixItFriday and we are still not convinced that we should expose the socket timeout per request. I explained above why the search timeout example shouldn't necessarily require to increase the socket timeout too, please let me know if that makes sense. Did you actually experience socket timeout problems when playing with search timeout? Or was that a theoretical use-case?

@yrodiere
Copy link
Contributor

Hello,

Did you actually experience socket timeout problems when playing with search timeout? Or was that a theoretical use-case?

We did experience socket timeout issues in one particular case. In Hibernate Search, we setup indexes for users and then allow users to use those indexes (index, search). But in the initial setup phase, we have to wait for the index to reach a particular status (yellow or green). This is done through a request to the Health API, with a particularly long timeout, because in this particular case we can wait. Which means that, for this single Health API call to work, we have to raise the socket timeout to a high value, which will affect other calls (search API in particular).

This is just an example, though: some requests to the Search API could require a longer timeout too, but only some of them.

Here are the workarounds we could think of, and why they're not satisfactory:

  • Raising the timeout globally and relying on the timeouts defined in API parameters.
    This is not a good idea because, if a node starts to fail, we may end up with very simple, normally very short requests taking a long time to execute (to ultimately time out) and finally clogging up the HTTP client and slowing down our own client platform. Whereas with correct socket timeouts, we could benefit from the failover feature of the REST client.
  • Setting up multiple RestClients
    This is not easily achievable in our case, because our own APIs already provide finer-grained tuning (a specific timeout per request). For instance each search request can have a specific timeout, defined at query time to the millisecond. So we'd basically have to create one client per request...

Configurable timeouts are a feature we provide when searching on a local Lucene instance, and it would just feel weird that when the indexing service is remote (and thus timeout issues are more relevant), we provide less control.

@javanna
Copy link
Member

javanna commented Feb 28, 2017

Hi @yrodiere thanks for the detailed explanation. The call to cluster health after creating an index should not be needed anymore against elasticsearch 5.x. Also, I wonder how long you would want to wait for green, 30 seconds (which is the default socket timeout) seems already quite high to me.

I do agree that having to globally increase the timeout is not ideal, as well as having separate client instances just because you need a different socket timeout. But I would like to understand first whether you really need a higher timeout in those calls and why.

@yrodiere
Copy link
Contributor

yrodiere commented Feb 28, 2017

The call to cluster health after creating an index should not be needed anymore against elasticsearch 5.x.

Thanks for the info. We target ES2 too, though. Not because we want to, but, you know, there are still users out there.

Also, I wonder how long you would want to wait for green, 30 seconds (which is the default socket timeout) seems already quite high to me.

We use 60s as a default. 30s would probably be alright, I agree. But I also think 30s is way too high for a global default. 30s for a random, reasonably tuned search requests (low hits size, sorts on optimized fields, etc.) is way too much. When your requests are used to build web pages, you generally don't want to wait for more than, say, one second.
So you'll want to either set the global timeout to 1s and override it for longer requests, or leave it to 30s and override it for shorter requests. Neither of which you can do reliably right now, because your only way to set timeouts is through the API, and that doesn't help when you're facing network or node failures.

But I would like to understand first whether you really need a higher timeout in those calls and why.

The problem is not so much that we can't set arbitrarily high timeouts, but it's more that we can't set multiple, different timeouts. With a Health request I'd want something like 30s. With a create index query, maybe something similar, 30s. With a normal search query I'd want something like 1s. With a very heavy search query (executed for some rarely used report, and thus not optimized at all), maybe 5s or 10s would be required.

Anyway, I wonder why we need to argue on whether high timeouts are needed. The Elasticsearch APIs provides ways to set timeouts. We established that currently, the non-configurable socket timeout gets in the way of these API timeouts. So, unless there are there plans to drop the API timeouts, I'd say we need to add something to make them usable on the client... don't we?

@javanna
Copy link
Member

javanna commented Feb 28, 2017

heya @yrodiere I wouldn't call this arguing, I am asking questions to better understand the usecase, as the change requires modifying public apis to provide the timeout per request. I am looking for the right compromise between making things configurable, yet keeping them simple, which is a challenge and I am sure you know that too :)

I definitely agree with you that the default of 30 seconds is way too high, I would advocate to have a 1 second default too, but users asked us to raise it and we saw in practice that 1 second was too low. Our reindex from remote api uses the REST client and we saw in practice the same problems, which convinced us to raise the default.

I tend to be in favour of exposing the timeout per request, my concerns are mainly around the fact that it can be misused. For instance when you mention search and apis that expose a timeout parameter, I would ask you to read my reply above. The semantics of the timeout query_string parameter varies api by api and is different compared to socket timeout.

I am marking this for discussion so we can talk about it again with the team and figure out what we want to do given the info you provided. Thank you!

@yrodiere
Copy link
Contributor

I tend to be in favour of exposing the timeout per request, my concerns are mainly around the fact that it can be misused.

Ah, I can only promise I will try to use it right :) But yes, more configuration options usually means more ways to blow up in your face.

I am marking this for discussion so we can talk about it again with the team and figure out what we want to do given the info you provided. Thank you!

Looking forward to hearing from you then. Thanks!

@yrodiere
Copy link
Contributor

Note: I just discovered there is another global timeout that takes precedence over all other timeouts. It is org.elasticsearch.client.RestClient.maxRetryTimeoutMillis, and it is used in performRequest, so any "synchronous" request that takes longer than this will time out.

If a solution is ever implemented, it probably should allow to raise this timeout per-request, too; otherwise raising the socket timeout will be useless.

Was there any discussion about the issue? Do you think there is a chance you would accept a pull request?

@javanna
Copy link
Member

javanna commented Mar 22, 2017

Right, I had forgotten about maxRetryTimeout. That one is about making sure that the whole request, including retries, doesn't take longer that a certain amount of time. Its default value is the same as the default socket timeout, but they are two very different things as one is about the whole duration of the request, while the second applies to each single request and it is applied when waiting for individual packets. I am now wondering if these two timeouts should have a different default value.

Good point, if we allow to change the socket timeout per request, also max retry should be changed per request, probably we should allow to pass in some kind of request config argument like the apache http client does. I don't know yet what we want to do, sorry we haven't discussed this yet with the team, I will bring it up asap and let you know.

@javanna javanna added help wanted adoptme and removed discuss labels May 5, 2017
dpitera added a commit to dpitera/janusgraph that referenced this issue Nov 1, 2017
The ES RestClient does [not
support](elastic/elasticsearch#21789 (comment))
a per request timeout configuration, however it does at least support a
timeout configuration option `maxRetriesInMillis` that will timeout the
overall request, including retries. It can be useful for others to
configure this timeout, especially given the fact that users can
configure their gremlin-script-execution timeout.

Issues: JanusGraph#688
dpitera added a commit to dpitera/janusgraph that referenced this issue Nov 1, 2017
The ES RestClient does [not
support](elastic/elasticsearch#21789 (comment))
a per request timeout configuration, however it does at least support a
timeout configuration option `maxRetriesInMillis` that will timeout the
overall request, including retries. It can be useful for others to
configure this timeout, especially given the fact that users can
configure their gremlin-script-execution timeout.

Issues: JanusGraph#688
Signed-off-by: dpitera <[email protected]>
dpitera added a commit to dpitera/janusgraph that referenced this issue Nov 1, 2017
The ES RestClient does [not
support](elastic/elasticsearch#21789 (comment))
a per request timeout configuration, however it does at least support a
timeout configuration option `maxRetriesInMillis` that will timeout the
overall request, including retries. It can be useful for others to
configure this timeout, especially given the fact that users can
configure their gremlin-script-execution timeout.

Issues: JanusGraph#688

Signed-off-by: David Pitera <[email protected]>
dpitera added a commit to dpitera/janusgraph that referenced this issue Nov 2, 2017
The ES RestClient does [not
support](elastic/elasticsearch#21789 (comment))
a per request timeout configuration, however it does at least support a
timeout configuration option `maxRetriesInMillis` that will timeout the
overall request, including retries. It can be useful for others to
configure this timeout, especially given the fact that users can
configure their gremlin-script-execution timeout.

Issues: JanusGraph#688

Signed-off-by: David Pitera <[email protected]>
dpitera added a commit to dpitera/janusgraph that referenced this issue Nov 16, 2017
The ES RestClient does [not
support](elastic/elasticsearch#21789 (comment))
a per request timeout configuration, however it does at least support a
timeout configuration option `maxRetriesInMillis` that will timeout the
overall request, including retries. It can be useful for others to
configure this timeout, especially given the fact that users can
configure their gremlin-script-execution timeout.

Issues: JanusGraph#688

Signed-off-by: David Pitera <[email protected]>
robertdale pushed a commit to JanusGraph/janusgraph that referenced this issue Nov 17, 2017
The ES RestClient does [not
support](elastic/elasticsearch#21789 (comment))
a per request timeout configuration, however it does at least support a
timeout configuration option `maxRetriesInMillis` that will timeout the
overall request, including retries. It can be useful for others to
configure this timeout, especially given the fact that users can
configure their gremlin-script-execution timeout.

Issues: #688

Signed-off-by: David Pitera <[email protected]>
@nik9000 nik9000 removed the help wanted adoptme label Mar 27, 2018
@nik9000 nik9000 self-assigned this Mar 27, 2018
@nik9000
Copy link
Member

nik9000 commented Mar 27, 2018

I think I'd like to have a look at this after #29211. #29211 gives us a clean way forward to set things like the timeout globally or for a single request so I think it is worth grabbing this after that.

@dannycohn
Copy link

I may have a use case that requires this:

We are creating a new index and bulk loading it. Per the ES recommendation, we are starting out at 0 replicas, and only once the index is fully loaded do we set the replicas higher. Once we do that, the index health goes yellow while it copies the data to the other nodes. We don't want to set the index live until the replicas are ready, so we're calling the health endpoint with wait_for_no_initializing_shards set to true. This can often take more than 30 seconds.

Thoughts?

@javanna javanna removed their assignment Aug 3, 2018
@nik9000 nik9000 removed their assignment Sep 23, 2018
@nik9000 nik9000 added the help wanted adoptme label Sep 23, 2018
bwatson-rti-org pushed a commit to bwatson-rti-org/janusgraph that referenced this issue Mar 9, 2019
The ES RestClient does [not
support](elastic/elasticsearch#21789 (comment))
a per request timeout configuration, however it does at least support a
timeout configuration option `maxRetriesInMillis` that will timeout the
overall request, including retries. It can be useful for others to
configure this timeout, especially given the fact that users can
configure their gremlin-script-execution timeout.

Issues: JanusGraph#688

Signed-off-by: David Pitera <[email protected]>
micpod pushed a commit to micpod/janusgraph that referenced this issue Nov 5, 2019
The ES RestClient does [not
support](elastic/elasticsearch#21789 (comment))
a per request timeout configuration, however it does at least support a
timeout configuration option `maxRetriesInMillis` that will timeout the
overall request, including retries. It can be useful for others to
configure this timeout, especially given the fact that users can
configure their gremlin-script-execution timeout.

Issues: JanusGraph#688
@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@dorony
Copy link

dorony commented Jul 27, 2020

Hi, I'd like to add our use case here as well.
We are developing a latency sensitive application that uses ES. It is critical for us to be able to set a different client side timeout per request. Each ES request in our system has a different latency limit. Some can take at most 100ms and some can take at most a few seconds.
This is supported in other ES clients (such as the request_timeout parameter in the python client).

With the removal of maxRetryTimeout in the 7.X version of the client it's unclear to me how to even set a client side timeout at all (not a socket timeout), let alone one that is different per request. Could you elaborate on the plan here?

@yrodiere
Copy link
Contributor

Wasn't this solved in 7.9 by #57972 ? From what I understand, one can now provide per-request configuration, including timeouts.

@dorony
Copy link

dorony commented Aug 27, 2020

@yrodiere seems to me this only provides the ability to select a socket timeout (timeout when waiting for individual packets), and not a real client side timeout, like the maxRetryTimeout achieves.

@yrodiere
Copy link
Contributor

@dorony As you noticed, maxRetryTimeout was removed some time ago. I believe the socket timeout is the only one getting in the way, and #57972 allows to raise it on a per-request basis.

This ticket (created by a coworker of mine) was initially about being able to raise the existing timeouts on a per-request basis so that a long-running operation (bulk updates, health check, ...) can be executed from time to time without raising the timeouts globally to some absurdly large value (e.g. 60s). As far as I can tell, that should now be doable. I still have to try and implement it in my library, though.

As for the client timeout, I personally implement it with a separate ScheduledExecutorService that cancels the (async) request after some time. Not as easy as it could be, but at least it's doable. I agree it would be nice to have that in the REST client too, but that looks like a different problem?

@dorony
Copy link

dorony commented Aug 27, 2020

Thanks @yrodiere. I think that from a user perspective a real client side timeout (implemented with a ScheduledExecutorService or otherwise) is what we'd expect, and more in line with other ES clients have (like the python one). I would love for a simple requestTimeout parameter that ensures the request doesn't exceed that timeout.

While implementing it outside the client is possible, it doesn't provide for the best experience - for example I won't be able to see which host the request has timed out against. Do you think I should open a different issue for that?

@yrodiere
Copy link
Contributor

Do you think I should open a different issue for that?

I do. But I'm not a developer of this client, I'm a user just like you :)

@dakrone
Copy link
Member

dakrone commented May 8, 2024

This has been open for quite a while, and hasn't had a lot of interest. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed. We also have the https://github.com/elastic/elasticsearch-java/ which is under more active development.

@dakrone dakrone closed this as not planned Won't fix, can't repro, duplicate, stale May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch >enhancement help wanted adoptme Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

9 participants