-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ESQL: Add optimization to purge join on null merge key #127583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This adds a new logical optimization rule to purge a Join in case the merge key(s) are null. The null detection is based on recognizing a tree pattern where the join sits atop a project and/or eval which contains a reference to a null, reference which matches the join key. It works at coordinator planning level, but it's most useful locally, after insertions of nulls in the plan on detecting missing fields.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @bpintea, I've created a changelog YAML for you. |
@@ -138,7 +138,7 @@ public String toString() { | |||
|
|||
@Override | |||
public String nodeString() { | |||
return child.nodeString() + " AS " + name(); | |||
return child.nodeString() + " AS " + name() + "#" + id(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly related, but not sure why we wouldn't include the id, it's easier to track which exactly reference points to an alias.
var exchange = as(agg.child(), ExchangeExec.class); | ||
agg = as(exchange.child(), AggregateExec.class); | ||
var extract = as(agg.child(), FieldExtractExec.class); | ||
var eval = as(extract.child(), EvalExec.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we need to see the languages == null
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added it.
var joinType = join.config().type(); | ||
if (joinType == INNER || joinType == LEFT) { // other types will have different replacement logic | ||
AttributeMap.Builder<Expression> attributeMapBuilder = AttributeMap.builder(); | ||
loop: for (var child = join.left();; child = ((UnaryPlan) child).child()) { // cast is safe as both plans are UnaryPlans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this loop you are collecting all the aliases that are contained in every Project and Eval on the left hand side of the LOOKUP until a LogicalPlan of a different type is found. Are there cases where evals and projects were not merged together already by CombineEvals and CombineProjections, respectively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cases where evals and projects were not merged together already by CombineEvals and CombineProjections, respectively?
Yes; but:
- if the plan goes through multiple transformations, that's OK, this rule will apply apply as soon as the pattern Join - Project/Eval is detected; and the plan will converge to a stable state eventually.
- the plan is most useful on the data node, where the plan has stabilised already (before taking into account the local conditions, that is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bpintea !
I focused on the actual optimizer rule, as I think @astefan already reviewed more deeply.
I left some remarks that I think should be addressed, but could yield a good optimizer rule. That said, I think there is potential for simplifying this together with other optimizer rules, namely ReplaceMissingFieldsWithNull
and PropagateEvalFoldables
, and this would hinge more on placing literals (null or not) into the join config.
I'll reach out offline to discuss this.
...org/elasticsearch/xpack/esql/optimizer/rules/logical/local/PruneJoinOnNullMatchingField.java
Outdated
Show resolved
Hide resolved
import static org.elasticsearch.xpack.esql.plan.logical.join.JoinTypes.LEFT; | ||
|
||
/** | ||
* The rule matches a plan pattern having a Join on top of a Project and/or Eval. It then checks if the join's performed on a field which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for this pattern is restrictive IMO, but it's also not what the rule actually does. Leftover?
Also, this is repeating some of the logic of PropagateEvalFoldables
. Could they share code? That one collects aliases from the plan when they point to literals (via potentially several indirections), which this rule also does. But PropagateEvalFoldables
does only 1 pass to collect all aliases, while this rule descends back into the children whenever it finds a join, forgetting about the previous resolutions.
PropagateEvalFoldables
has a boolean called shouldCollect
. That's looks like it could become a more general predicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for this pattern is restrictive IMO
It is more restrictive than I'd wish for it to be more effective, indeed, but I found no other way to detect null
join keys (in stabilised plans, after all transformations). Any other command doing data transformations would need executing or "invasive" analysis to determine if a null
just passes through. But happy to apply modifications if I overlooked solutions.
Non-data-transforming commands are pushed out of the way (a.t.p.).
but it's also not what the rule actually does. Leftover?
No, I think that's actually what the rule does; but maybe I misunderstood the question? :)
this is repeating some of the logic of PropagateEvalFoldables. Could they share code?
The refs collection in this rule is more restrictive than that in PropagateEvalFoldables
and operates on few node types. I guess there might be a way to share code, but not sure it'll make it more legible.
But PropagateEvalFoldables does only 1 pass to collect all aliases, while this rule descends back into the children whenever it finds a join, forgetting about the previous resolutions.
Right; that's because it can only apply this rule on a specific tree pattern; if there were more nodes in-between (of different types), the rule wouldn't work.
PropagateEvalFoldables has a boolean called shouldCollect. That's looks like it could become a more general predicate?
The tree traversing would be different, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for this pattern is restrictive IMO, but it's also not what the rule actually does. Leftover?
Sorry, you are correct; I didn't notice the break in the switch below and thought we'd just go over all children but looking only for evals and projections in it.
The refs collection in this rule is more restrictive than that in PropagateEvalFoldables and operates on few node types. I guess there might be a way to share code, but not sure it'll make it more legible.
My point is that both rules look for chains of aliases and propagate the result into a command that (transitively) depneds on a literal. I think we should have only 1 way to do this; if the approach is correct for PropagateEvalFoldables, it should also be correct here - and if it's not, then PropagateEvalFoldables probably has a bug and we need to find a different solution.
Conceptually, the difference is that PropagateEvalFoldables just places a literal in the downstream command - while we rather wish to prune it - but that could be solved either by to place a literal into the join config (which I think is interesting anyway because that allows even more optimizations).
import static org.elasticsearch.xpack.esql.plan.logical.join.JoinTypes.LEFT; | ||
|
||
/** | ||
* The rule matches a plan pattern having a Join on top of a Project and/or Eval. It then checks if the join's performed on a field which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for this pattern is restrictive IMO, but it's also not what the rule actually does. Leftover?
Also, this is repeating some of the logic of PropagateEvalFoldables
. Could they share code? That one collects aliases from the plan when they point to literal (via potentially several indirections), which this rule also does. But PropagateEvalFoldables
does only 1 pass to collect all aliases, while this rule descends back into the children whenever it finds a join, forgetting about the previous resolutions.
default -> { | ||
break loop; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be other aliasing further down the plan. E.g. FROM ... | EVAL x = NULL | SORT whatever | LOOKUP JOIN ... ON x
. In particular, I think this will not mesh well with the current implementation of ReplaceMissingFieldsWithNull
which places EVAL missing = NULL
at the very source of the query plan, right on top the EsRelation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rule applies last in a rule batch on coordinator, but more importantly, it applies on data nodes after the tree has stabilised on the coordinator, and only taking into account the local conditions can modify it (such as missing fields detection). The SORT (and LIMIT) will have been pushed further down by now. This is true for the coordinator too, btw, it just might take more rules batch iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't assume certain patterns for a rule to work. When more commands are added to ESQL, our assumptions will break and this means that unrelated changes can lead to (potentially very noticeable) performance regressions. Also, we might just not have thought about all the examples. For instance, couldn't there be a non-pushable WHERE clause that depends on an EVAL upstream from the join? With this approach, we wouldn't prune the join in this case even when the join key is missing from the index, I believe.
For this reason, I'd prefer if we went with a less pattern-reliant approach like in PropagateEvalFoldables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't assume certain patterns for a rule to work.
I took another look and reused the PropagateEvalFoldables
logic to build the references. No particular pattern is assumed anymore. Thanks for insisting. :)
} | ||
} | ||
} | ||
for (var attr : AttributeSet.of(join.config().matchFields())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this shouldn't go over the match fields, but over the left + right fields. I don't trust the match fields right now, as their contract is never enforced and they only exist because our Join modeling got wonky.
That said, for the current implementation it's probably correct: only for the left fields can we ever know that they will be null before running the execution.
protected LogicalPlan rule(Join join) { | ||
LogicalPlan plan = join; | ||
var joinType = join.config().type(); | ||
if (joinType == INNER || joinType == LEFT) { // other types will have different replacement logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a field is always null in an INNER join, isn't the result empty? No match => no row goes into the result, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a field is always null in an INNER join, isn't the result empty? No match => no row goes into the result, no?
You're right, INNER
should not be considered. Fixed, thanks! Also renamed the rule to reflect this change.
|
||
private static LogicalPlan replaceJoin(Join join) { | ||
var joinRightOutput = join.rightOutputFields(); | ||
if (joinRightOutput.isEmpty()) { // can be empty when the join key is null and the other right side entries pruned (by an agg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They right side attributes can also be pruned by a keep, drop or by being shadowed - not just by aggs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) Yes.
The (silly) intention was not to go over the line limit, this being an inlined comment. Pulled the comment on its own line and slightly extended it. (Also, the projection-related commands would be moved out by now.)
return join.left(); | ||
} | ||
List<Alias> aliases = new ArrayList<>(joinRightOutput.size()); | ||
// TODO: cache aliases by type, à la ReplaceMissingFieldWithNull#missingToNull (tho lookup indices won't have Ks of fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, let's re-use that code, maybe!
Not sure if lookup indices will have much fewer fields, I expect them to grow over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if lookup indices will have much fewer fields, I expect them to grow over time.
Assuming here, but I believe that the Ks of fields occurrence stems mostly from ECS and lookup indices would probably grow too, but controlled (as they're more "manual"?). However this can be an issue with "generic" indices joins, indeed.
Yep, let's re-use that code, maybe!
If we won't remodel our approach, "simplifying this together with other optimizer rules, namely ReplaceMissingFieldsWithNull
and PropagateEvalFoldables
", I'd add this in an immediate follow-up, as it'd touch ReplaceMissingFieldsWithNull
... but can also do it here, if you feel strongly about this. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think this is a blocker. A follow-up is perfectly fine IMO. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add this in an immediate follow-up
I've pushed the change in this PR, after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unblocking because the ClassCastException that I thought I saw can't actually happen. My bad.
I'd prefer if we try consolidating the alias propagation logic with that from PropagateEvalFoldables and shared code. I believe this will be less brittle over time and sharing code will mean we have 1 mechanism to keep correct rather than 2.
However, the added tests and the initial approach already are an improvement in itself, so there's no reason to hold this PR.
I won't be available to review more deeply - could you please continue iterating with @astefan ?(Who already reviewed more precisely, anyway - thanks Andrei!)
Pushed an update to reuse code from both |
This adds a new logical optimization rule to purge a Join in case the merge key(s) are null. The null detection is based on recognizing a tree pattern where the join sits atop a project and/or eval (possibly a few nodes deep) which contains a reference to a
null
, reference which matches the join key.It works at coordinator planning level, but it's most useful locally, after insertions of
nulls
in the plan on detecting missing fields.The Join is substituted with a projection with the same attributes as the join, atop an eval with all join's right fields aliased to null.
Closes #125577.