-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Enhance logical plan explicitly projecting join keys #88833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance logical plan explicitly projecting join keys #88833
Conversation
avoid to create field extractors for all the fields
Pinging @elastic/es-ql (Team:QL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a shutgun approach. While it works for sequences and samples, the semantics are a bit different: a sample doesn't need a timestamp
and the fetch_size (the LimitWithOffset plan) is passed to the queries in a different way (directly through the EqlConfiguration instance), without creating an intermediate plan.
.map(FieldAttribute.class::cast) | ||
.map(FieldAttribute::name) | ||
.collect(toList()); | ||
assertTrue(projections.contains("@timestamp")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timestamp
is not needed no?
// first per KeyedFilter | ||
plan = plan.transformUp(KeyedFilter.class, k -> { | ||
Project p = new Project(projectCtx, k.child(), k.extractionAttributes()); | ||
Project p = new Project(projectCtx, k.child(), k.extractionAttributes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k.extractionAttributes
returns the timestamp and tiebreaker fields, as well. A sample doesn't need these. Instead of using this method, maybe better to return only the keys
.
p | ||
); | ||
// TODO: this could be incorporated into the query generation | ||
LogicalPlan fetchSize = new LimitWithOffset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is not needed for samples. The fetch_size
is passed directly (through the EqlConfiguration instance) in the ExecutionManager. Maybe use a simple projection for samples:
plan = plan.transformUp(KeyedFilter.class, k -> {
Project p = new Project(projectCtx, k.child(), k.keys());
return new KeyedFilter(k.source(), p, k.keys(), k.timestamp(), k.tiebreaker());
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -49,10 +48,16 @@ public LogicalPlan postAnalyze(LogicalPlan plan, EqlConfiguration configuration) | |||
Holder<Boolean> hasJoin = new Holder<>(Boolean.FALSE); | |||
|
|||
Source projectCtx = synthetic("<implicit-project>"); | |||
if (plan.anyMatch(Sequence.class::isInstance)) { | |||
hasJoin.set(Boolean.TRUE); | |||
if (plan.anyMatch(x -> x instanceof Sample)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sample.class::isInstance
- use the method reference instead of the lambda expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me however the if code can be compacted (per comments) - in its current form it's confusing what happens due to the repetition between the two branches.
Project p = new Project(projectCtx, k.child(), k.keys()); | ||
return new KeyedFilter(k.source(), p, k.keys(), k.timestamp(), k.tiebreaker()); | ||
}); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new piece of code has different semantics then the previous one - the projection is now applied to all keyed filter not just those under Sequence (including Sample) which is too broad.
Second there's the issue of code duplication since both branches create new objects and do the transformUp - I suggest making them more compact:
if (plan.anyMatch(Sequence.class::isInstance)) {
hasJoin.set(Boolean.TRUE);
final boolean isSample = Sequence.class.isInstance(plan);
plan = plan.transformUp(KeyedFilter.class, k -> {
var keys = isSample ? k.keys() : k.extractionAttributes();
Project p = new Project(projectCtx, k.child(), keys);
...
}
Essentially the if filter down into the plan transformUp method as oppose to where it is right now.
@elasticsearchmachine run elasticsearch-ci/part-1 |
preserving previous logic, so that KeyedFilter is used only for sequences and samples
In Sample query execution, intermediary queries do not need to return all the fields from the retrieved documents: only join keys are needed to for the processing.
This PR optimizes this aspect, re-writing the logical plan to only project the join keys, avoiding to create field extractors for not needed fields.
This also avoids that the query folder tries to do field extraction on fields with no exact match (eg. "text" fields), that would result in an error.