-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Check for early termination in Driver #118188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
50b14e4
to
27d4a6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we'd be better off having the Operator subclasses yield control - if they return null
then the driver loop will check for them. But they have to store their intermediate state and that's annoying. This solution is more general though it's more "different" than the rest of the driver stuff works. Where are we looking at putting these?
97bb696
to
376040c
Compare
71d42f1
to
6d21d28
Compare
930110c
to
1a8f5b4
Compare
Tested the newest version (b9deea7) locally and it seems to do the interruption mid-page properly. |
Hi @dnhatn, I've created a changelog YAML for you. |
3d6d0a8
to
2d4bf70
Compare
2d4bf70
to
b63bd27
Compare
4256727
to
fd8d5a6
Compare
@@ -41,7 +41,7 @@ public static EvalOperator.ExpressionEvaluator.Factory toEvaluator( | |||
return new AutomataMatchEvaluator.Factory(source, field, run, toDot(automaton)); | |||
} | |||
|
|||
@Evaluator | |||
@Evaluator(executionCost = 50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's just one example - we might not need this for all evaluators, only for the expensive ones whose inputs or outputs are BytesRef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we almost certainly don't want it for most evaluators, but I can really see the use for bailing early for slow stuff - and, I think, all of those take or produce ByesRef
. Automata match, geo stuff. Otherwise I think we're frequently dealing with more overhead than the operation. And we're quite likely to break autovectorization.
I wonder if instead of a "cost" this should be a, maybe simpler, "check for termination every n rows" style. p % <constant> == 0
is going to be pretty fast, especially if the constant is, like, 1
or 2
or 4
or 8
. Hell, that feels like something loop unrolling might handle, though I don't think we're really worried about loop unrolling for these cases.
One thing that's really interesting - some of these automata matches are probably quite fast and not worth the checking. But we don't differentiate. And that's fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we're really
@@ -41,7 +41,7 @@ public static EvalOperator.ExpressionEvaluator.Factory toEvaluator( | |||
return new AutomataMatchEvaluator.Factory(source, field, run, toDot(automaton)); | |||
} | |||
|
|||
@Evaluator | |||
@Evaluator(executionCost = 50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we almost certainly don't want it for most evaluators, but I can really see the use for bailing early for slow stuff - and, I think, all of those take or produce ByesRef
. Automata match, geo stuff. Otherwise I think we're frequently dealing with more overhead than the operation. And we're quite likely to break autovectorization.
I wonder if instead of a "cost" this should be a, maybe simpler, "check for termination every n rows" style. p % <constant> == 0
is going to be pretty fast, especially if the constant is, like, 1
or 2
or 4
or 8
. Hell, that feels like something loop unrolling might handle, though I don't think we're really worried about loop unrolling for these cases.
One thing that's really interesting - some of these automata matches are probably quite fast and not worth the checking. But we don't differentiate. And that's fine for now.
67a0e48
to
cec1749
Compare
cec1749
to
c6815f4
Compare
@nik9000 Thanks for the feedback. I have a previous implementation that unrolls loops to maintain auto-vectorization (for example: Lines 83 to 92 in 71d42f1
|
👍 |
@smalyshev @nik9000 Thanks for reviews + feedback! |
This change introduces support for periodically checking for early termination. This enables early exits in the following scenarios: 1. The query has accumulated sufficient data (e.g., reaching the LIMIT). 2. The query is stopped (either by users or due to failures). Other changes will be addressed in follow-up PRs.
This change introduces support for periodically checking for early termination. This enables early exits in the following scenarios: 1. The query has accumulated sufficient data (e.g., reaching the LIMIT). 2. The query is stopped (either by users or due to failures). Other changes will be addressed in follow-up PRs.
Backported to 8.18 in #120238 |
This change introduces support for periodically checking for early termination. This enables early exits in the following scenarios:
Other changes will be addressed in follow-up PRs.