Return cursor id before row or send cursor as header in any format of sql response #31819

a-a-davydov · 2018-07-05T11:18:29Z

In current version _xpack sql rest api send cursor field in response after rows, but in some cases, it is useful to know if this response is last batch of data. For example, if I want to store result in other index using bulk api, i woud like to set refresh=wait_for only for the last batch.

Please, send cursor before rows, or send boolean header if cursor is over for all response formats

elasticmachine · 2018-07-05T12:09:22Z

Pinging @elastic/es-search-aggs

costin · 2018-07-05T12:12:46Z

Could you explain why you'd prefer having the cursor before the columns? Can you just add refresh=wait to the URL once the request has been fully parsed?

The header option isn't viable since the cursor can grow in size and pass the maximum header limit (see #16993).
The cursor could be moved up front to be consistent with the scroll API (potentially by adding a _ prefix as well).

a-a-davydov · 2018-07-05T12:21:57Z

Could you explain why you'd prefer having the cursor before the columns? Can you just add refresh=wait to the URL once the request has been fully parsed?

I can open index request in separate thread and stream data from from response http entity to request. It helps to avoid unnecessery memory allocation (needs to have only some reusable buffers to store entries in queue between response and request)

The header option isn't viable since the cursor can grow in size and pass the maximum header limit

It can be just boolean header indicates if this response is the last for current cursor. (i.e. if it will be cursor field at the end)

nik9000 · 2018-07-05T15:32:35Z

It can be just boolean header indicates if this response is the last for current cursor. (i.e. if it will be cursor field at the end)

I don't think we can do this consistently. It is possible in some queries for us to know that there won't be other batches but for others we can't know until we pull that batch. Since we don't store state in SQL and we don't want to store state in SQL we'd only ever be able to tell you when we're sure that there isn't another row. But if we tell you there is another row there might not be one.

For example, if I want to store result in other index using bulk api, i woud like to set refresh=wait_for only for the last batch.

I don't think this'll work in all cases. refresh=wait_for only waits for refreshes on the shards that received documents.

You might be better off making all of the bulk writes to Elasticsearch asynchronously and adding refresh=wait_for to all of them and then waiting for them all to succeed.

a-a-davydov · 2018-07-05T17:15:24Z

I don't think this'll work in all cases. refresh=wait_for only waits for refreshes on the shards that received documents.

thanks, it is important. It will be good to update ?refresh and _bulk documentation

I don't think we can do this consistently. It is possible in some queries for us to know that there won't be other batches but for others we can't know until we pull that batch. Since we don't store state in SQL and we don't want to store state in SQL we'd only ever be able to tell you when we're sure that there isn't another row. But if we tell you there is another row there might not be one.

I don't ask to change any system behaviour. I meet But if we tell you there is another row there might not be one behaviour, because if you say there is no another row, there is no another row. And if I get this information before fully parse 1000 records I may use this fact for some optimisations.

Additional header is better than change JSON field order, because JSON semantic ignores field order and it is bad practice to use json field order in application logic.

Only the shards that receive the bulk request will be affected by `refresh`. Imagine a `_bulk?refresh=wait_for` request with three documents in it that happen to be routed to different shards in an index with five shards. The request will only wait for those three shards to refresh. The other two shards of that make up the index do not participate in the `_bulk` request at all. Relates to elastic#31819

Only the shards that receive the bulk request will be affected by `refresh`. Imagine a `_bulk?refresh=wait_for` request with three documents in it that happen to be routed to different shards in an index with five shards. The request will only wait for those three shards to refresh. The other two shards of that make up the index do not participate in the `_bulk` request at all. Relates to #31819

elasticsearchmachine · 2024-01-17T21:37:42Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

wchaparro · 2024-03-20T20:44:24Z

superceded by ES|QL

costin added >enhancement :Analytics/SQL SQL querying labels Jul 5, 2018

costin added the team-discuss label Jul 5, 2018

colings86 removed the team-discuss label Aug 7, 2018

rjernst added the Team:QL (Deprecated) Meta label for query languages team label May 4, 2020

wchaparro removed the Team:QL (Deprecated) Meta label for query languages team label Jan 17, 2024

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 17, 2024

wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Return cursor id before row or send cursor as header in any format of sql response #31819

Return cursor id before row or send cursor as header in any format of sql response #31819

a-a-davydov commented Jul 5, 2018

elasticmachine commented Jul 5, 2018

Uh oh!

costin commented Jul 5, 2018

Uh oh!

a-a-davydov commented Jul 5, 2018

Uh oh!

nik9000 commented Jul 5, 2018

Uh oh!

a-a-davydov commented Jul 5, 2018 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jan 17, 2024

Uh oh!

wchaparro commented Mar 20, 2024

Uh oh!

Return cursor id before row or send cursor as header in any format of sql response #31819

Return cursor id before row or send cursor as header in any format of sql response #31819

Comments

a-a-davydov commented Jul 5, 2018

elasticmachine commented Jul 5, 2018

Uh oh!

costin commented Jul 5, 2018

Uh oh!

a-a-davydov commented Jul 5, 2018

Uh oh!

nik9000 commented Jul 5, 2018

Uh oh!

a-a-davydov commented Jul 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jan 17, 2024

Uh oh!

wchaparro commented Mar 20, 2024

Uh oh!

a-a-davydov commented Jul 5, 2018 •

edited

Loading