Skip to content

Return cursor id before row or send cursor as header in any format of sql response #31819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
a-a-davydov opened this issue Jul 5, 2018 · 7 comments
Labels
:Analytics/SQL SQL querying >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@a-a-davydov
Copy link

In current version _xpack sql rest api send cursor field in response after rows, but in some cases, it is useful to know if this response is last batch of data. For example, if I want to store result in other index using bulk api, i woud like to set refresh=wait_for only for the last batch.

Please, send cursor before rows, or send boolean header if cursor is over for all response formats

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@costin
Copy link
Member

costin commented Jul 5, 2018

Could you explain why you'd prefer having the cursor before the columns? Can you just add refresh=wait to the URL once the request has been fully parsed?

The header option isn't viable since the cursor can grow in size and pass the maximum header limit (see #16993).
The cursor could be moved up front to be consistent with the scroll API (potentially by adding a _ prefix as well).

@a-a-davydov
Copy link
Author

Could you explain why you'd prefer having the cursor before the columns? Can you just add refresh=wait to the URL once the request has been fully parsed?

I can open index request in separate thread and stream data from from response http entity to request. It helps to avoid unnecessery memory allocation (needs to have only some reusable buffers to store entries in queue between response and request)

The header option isn't viable since the cursor can grow in size and pass the maximum header limit

It can be just boolean header indicates if this response is the last for current cursor. (i.e. if it will be cursor field at the end)

@nik9000
Copy link
Member

nik9000 commented Jul 5, 2018

It can be just boolean header indicates if this response is the last for current cursor. (i.e. if it will be cursor field at the end)

I don't think we can do this consistently. It is possible in some queries for us to know that there won't be other batches but for others we can't know until we pull that batch. Since we don't store state in SQL and we don't want to store state in SQL we'd only ever be able to tell you when we're sure that there isn't another row. But if we tell you there is another row there might not be one.

For example, if I want to store result in other index using bulk api, i woud like to set refresh=wait_for only for the last batch.

I don't think this'll work in all cases. refresh=wait_for only waits for refreshes on the shards that received documents.

You might be better off making all of the bulk writes to Elasticsearch asynchronously and adding refresh=wait_for to all of them and then waiting for them all to succeed.

@a-a-davydov
Copy link
Author

a-a-davydov commented Jul 5, 2018

I don't think this'll work in all cases. refresh=wait_for only waits for refreshes on the shards that received documents.

thanks, it is important. It will be good to update ?refresh and _bulk documentation

I don't think we can do this consistently. It is possible in some queries for us to know that there won't be other batches but for others we can't know until we pull that batch. Since we don't store state in SQL and we don't want to store state in SQL we'd only ever be able to tell you when we're sure that there isn't another row. But if we tell you there is another row there might not be one.

I don't ask to change any system behaviour. I meet But if we tell you there is another row there might not be one behaviour, because if you say there is no another row, there is no another row. And if I get this information before fully parse 1000 records I may use this fact for some optimisations.

Additional header is better than change JSON field order, because JSON semantic ignores field order and it is bad practice to use json field order in application logic.

nik9000 added a commit to nik9000/elasticsearch that referenced this issue Jul 5, 2018
Only the shards that receive the bulk request will be affected by
`refresh`. Imagine a `_bulk?refresh=wait_for` request with three
documents in it that happen to be routed to different shards in an index
with five shards. The request will only wait for those three shards to
refresh. The other two shards of that make up the index do not
participate in the `_bulk` request at all.

Relates to elastic#31819
nik9000 added a commit that referenced this issue Jul 5, 2018
Only the shards that receive the bulk request will be affected by
`refresh`. Imagine a `_bulk?refresh=wait_for` request with three
documents in it that happen to be routed to different shards in an index
with five shards. The request will only wait for those three shards to
refresh. The other two shards of that make up the index do not
participate in the `_bulk` request at all.

Relates to #31819
nik9000 added a commit that referenced this issue Jul 5, 2018
Only the shards that receive the bulk request will be affected by
`refresh`. Imagine a `_bulk?refresh=wait_for` request with three
documents in it that happen to be routed to different shards in an index
with five shards. The request will only wait for those three shards to
refresh. The other two shards of that make up the index do not
participate in the `_bulk` request at all.

Relates to #31819
@rjernst rjernst added the Team:QL (Deprecated) Meta label for query languages team label May 4, 2020
@wchaparro wchaparro removed the Team:QL (Deprecated) Meta label for query languages team label Jan 17, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@wchaparro
Copy link
Member

superceded by ES|QL

@wchaparro wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

8 participants