Skip to content

ESQL: Add optimization to purge join on null merge key #127583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Apr 30, 2025

This adds a new logical optimization rule to purge a Join in case the merge key(s) are null. The null detection is based on recognizing a tree pattern where the join sits atop a project and/or eval (possibly a few nodes deep) which contains a reference to a null, reference which matches the join key.

It works at coordinator planning level, but it's most useful locally, after insertions of nulls in the plan on detecting missing fields.

The Join is substituted with a projection with the same attributes as the join, atop an eval with all join's right fields aliased to null.

Closes #125577.

bpintea added 2 commits April 30, 2025 19:38
This adds a new logical optimization rule to purge a Join in case the
merge key(s) are null. The null detection is based on recognizing a tree
pattern where the join sits atop a project and/or eval which contains a
reference to a null, reference which matches the join key.

It works at coordinator planning level, but it's most useful locally,
after insertions of nulls in the plan on detecting missing fields.
@bpintea bpintea added >enhancement auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.19.0 v9.1.0 labels Apr 30, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

@@ -138,7 +138,7 @@ public String toString() {

@Override
public String nodeString() {
return child.nodeString() + " AS " + name();
return child.nodeString() + " AS " + name() + "#" + id();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly related, but not sure why we wouldn't include the id, it's easier to track which exactly reference points to an alias.

@alex-spies alex-spies self-requested a review May 2, 2025 09:35
var exchange = as(agg.child(), ExchangeExec.class);
agg = as(exchange.child(), AggregateExec.class);
var extract = as(agg.child(), FieldExtractExec.class);
var eval = as(extract.child(), EvalExec.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we need to see the languages == null part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added it.

var joinType = join.config().type();
if (joinType == INNER || joinType == LEFT) { // other types will have different replacement logic
AttributeMap.Builder<Expression> attributeMapBuilder = AttributeMap.builder();
loop: for (var child = join.left();; child = ((UnaryPlan) child).child()) { // cast is safe as both plans are UnaryPlans
Copy link
Contributor

@astefan astefan May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this loop you are collecting all the aliases that are contained in every Project and Eval on the left hand side of the LOOKUP until a LogicalPlan of a different type is found. Are there cases where evals and projects were not merged together already by CombineEvals and CombineProjections, respectively?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where evals and projects were not merged together already by CombineEvals and CombineProjections, respectively?

Yes; but:

  • if the plan goes through multiple transformations, that's OK, this rule will apply apply as soon as the pattern Join - Project/Eval is detected; and the plan will converge to a stable state eventually.
  • the plan is most useful on the data node, where the plan has stabilised already (before taking into account the local conditions, that is).

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bpintea !

I focused on the actual optimizer rule, as I think @astefan already reviewed more deeply.

I left some remarks that I think should be addressed, but could yield a good optimizer rule. That said, I think there is potential for simplifying this together with other optimizer rules, namely ReplaceMissingFieldsWithNull and PropagateEvalFoldables, and this would hinge more on placing literals (null or not) into the join config.

I'll reach out offline to discuss this.

import static org.elasticsearch.xpack.esql.plan.logical.join.JoinTypes.LEFT;

/**
* The rule matches a plan pattern having a Join on top of a Project and/or Eval. It then checks if the join's performed on a field which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for this pattern is restrictive IMO, but it's also not what the rule actually does. Leftover?

Also, this is repeating some of the logic of PropagateEvalFoldables. Could they share code? That one collects aliases from the plan when they point to literals (via potentially several indirections), which this rule also does. But PropagateEvalFoldables does only 1 pass to collect all aliases, while this rule descends back into the children whenever it finds a join, forgetting about the previous resolutions.

PropagateEvalFoldables has a boolean called shouldCollect. That's looks like it could become a more general predicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for this pattern is restrictive IMO

It is more restrictive than I'd wish for it to be more effective, indeed, but I found no other way to detect null join keys (in stabilised plans, after all transformations). Any other command doing data transformations would need executing or "invasive" analysis to determine if a null just passes through. But happy to apply modifications if I overlooked solutions.
Non-data-transforming commands are pushed out of the way (a.t.p.).

but it's also not what the rule actually does. Leftover?

No, I think that's actually what the rule does; but maybe I misunderstood the question? :)

this is repeating some of the logic of PropagateEvalFoldables. Could they share code?

The refs collection in this rule is more restrictive than that in PropagateEvalFoldables and operates on few node types. I guess there might be a way to share code, but not sure it'll make it more legible.

But PropagateEvalFoldables does only 1 pass to collect all aliases, while this rule descends back into the children whenever it finds a join, forgetting about the previous resolutions.

Right; that's because it can only apply this rule on a specific tree pattern; if there were more nodes in-between (of different types), the rule wouldn't work.

PropagateEvalFoldables has a boolean called shouldCollect. That's looks like it could become a more general predicate?

The tree traversing would be different, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for this pattern is restrictive IMO, but it's also not what the rule actually does. Leftover?

Sorry, you are correct; I didn't notice the break in the switch below and thought we'd just go over all children but looking only for evals and projections in it.

The refs collection in this rule is more restrictive than that in PropagateEvalFoldables and operates on few node types. I guess there might be a way to share code, but not sure it'll make it more legible.

My point is that both rules look for chains of aliases and propagate the result into a command that (transitively) depneds on a literal. I think we should have only 1 way to do this; if the approach is correct for PropagateEvalFoldables, it should also be correct here - and if it's not, then PropagateEvalFoldables probably has a bug and we need to find a different solution.

Conceptually, the difference is that PropagateEvalFoldables just places a literal in the downstream command - while we rather wish to prune it - but that could be solved either by to place a literal into the join config (which I think is interesting anyway because that allows even more optimizations).

import static org.elasticsearch.xpack.esql.plan.logical.join.JoinTypes.LEFT;

/**
* The rule matches a plan pattern having a Join on top of a Project and/or Eval. It then checks if the join's performed on a field which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for this pattern is restrictive IMO, but it's also not what the rule actually does. Leftover?

Also, this is repeating some of the logic of PropagateEvalFoldables. Could they share code? That one collects aliases from the plan when they point to literal (via potentially several indirections), which this rule also does. But PropagateEvalFoldables does only 1 pass to collect all aliases, while this rule descends back into the children whenever it finds a join, forgetting about the previous resolutions.

Comment on lines 53 to 55
default -> {
break loop;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be other aliasing further down the plan. E.g. FROM ... | EVAL x = NULL | SORT whatever | LOOKUP JOIN ... ON x. In particular, I think this will not mesh well with the current implementation of ReplaceMissingFieldsWithNull which places EVAL missing = NULL at the very source of the query plan, right on top the EsRelation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule applies last in a rule batch on coordinator, but more importantly, it applies on data nodes after the tree has stabilised on the coordinator, and only taking into account the local conditions can modify it (such as missing fields detection). The SORT (and LIMIT) will have been pushed further down by now. This is true for the coordinator too, btw, it just might take more rules batch iterations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't assume certain patterns for a rule to work. When more commands are added to ESQL, our assumptions will break and this means that unrelated changes can lead to (potentially very noticeable) performance regressions. Also, we might just not have thought about all the examples. For instance, couldn't there be a non-pushable WHERE clause that depends on an EVAL upstream from the join? With this approach, we wouldn't prune the join in this case even when the join key is missing from the index, I believe.

For this reason, I'd prefer if we went with a less pattern-reliant approach like in PropagateEvalFoldables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't assume certain patterns for a rule to work.

I took another look and reused the PropagateEvalFoldables logic to build the references. No particular pattern is assumed anymore. Thanks for insisting. :)

}
}
}
for (var attr : AttributeSet.of(join.config().matchFields())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this shouldn't go over the match fields, but over the left + right fields. I don't trust the match fields right now, as their contract is never enforced and they only exist because our Join modeling got wonky.

That said, for the current implementation it's probably correct: only for the left fields can we ever know that they will be null before running the execution.

protected LogicalPlan rule(Join join) {
LogicalPlan plan = join;
var joinType = join.config().type();
if (joinType == INNER || joinType == LEFT) { // other types will have different replacement logic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a field is always null in an INNER join, isn't the result empty? No match => no row goes into the result, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a field is always null in an INNER join, isn't the result empty? No match => no row goes into the result, no?

You're right, INNER should not be considered. Fixed, thanks! Also renamed the rule to reflect this change.


private static LogicalPlan replaceJoin(Join join) {
var joinRightOutput = join.rightOutputFields();
if (joinRightOutput.isEmpty()) { // can be empty when the join key is null and the other right side entries pruned (by an agg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They right side attributes can also be pruned by a keep, drop or by being shadowed - not just by aggs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) Yes.
The (silly) intention was not to go over the line limit, this being an inlined comment. Pulled the comment on its own line and slightly extended it. (Also, the projection-related commands would be moved out by now.)

return join.left();
}
List<Alias> aliases = new ArrayList<>(joinRightOutput.size());
// TODO: cache aliases by type, à la ReplaceMissingFieldWithNull#missingToNull (tho lookup indices won't have Ks of fields)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, let's re-use that code, maybe!

Not sure if lookup indices will have much fewer fields, I expect them to grow over time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if lookup indices will have much fewer fields, I expect them to grow over time.

Assuming here, but I believe that the Ks of fields occurrence stems mostly from ECS and lookup indices would probably grow too, but controlled (as they're more "manual"?). However this can be an issue with "generic" indices joins, indeed.

Yep, let's re-use that code, maybe!

If we won't remodel our approach, "simplifying this together with other optimizer rules, namely ReplaceMissingFieldsWithNull and PropagateEvalFoldables", I'd add this in an immediate follow-up, as it'd touch ReplaceMissingFieldsWithNull... but can also do it here, if you feel strongly about this. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think this is a blocker. A follow-up is perfectly fine IMO. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add this in an immediate follow-up

I've pushed the change in this PR, after all.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking because the ClassCastException that I thought I saw can't actually happen. My bad.

I'd prefer if we try consolidating the alias propagation logic with that from PropagateEvalFoldables and shared code. I believe this will be less brittle over time and sharing code will mean we have 1 mechanism to keep correct rather than 2.

However, the added tests and the initial approach already are an improvement in itself, so there's no reason to hold this PR.

I won't be available to review more deeply - could you please continue iterating with @astefan ?(Who already reviewed more precisely, anyway - thanks Andrei!)

@bpintea
Copy link
Contributor Author

bpintea commented May 20, 2025

I'd prefer if we try consolidating the alias propagation logic with that from PropagateEvalFoldables and shared code. I believe this will be less brittle over time and sharing code will mean we have 1 mechanism to keep correct rather than 2.

Pushed an update to reuse code from both PropagateEvalFoldables and ReplaceMissingFieldWithNull.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Skip LOOKUP JOIN when join key is missing from index
4 participants