Skip to content

Enhance logical plan explicitly projecting join keys #88833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

luigidellaquila
Copy link
Contributor

@luigidellaquila luigidellaquila commented Jul 27, 2022

In Sample query execution, intermediary queries do not need to return all the fields from the retrieved documents: only join keys are needed to for the processing.
This PR optimizes this aspect, re-writing the logical plan to only project the join keys, avoiding to create field extractors for not needed fields.

This also avoids that the query folder tries to do field extraction on fields with no exact match (eg. "text" fields), that would result in an error.

avoid to create field extractors for all the fields
@luigidellaquila luigidellaquila marked this pull request as ready for review July 28, 2022 07:45
@luigidellaquila luigidellaquila requested a review from astefan July 28, 2022 07:45
@elasticsearchmachine elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Jul 28, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a shutgun approach. While it works for sequences and samples, the semantics are a bit different: a sample doesn't need a timestamp and the fetch_size (the LimitWithOffset plan) is passed to the queries in a different way (directly through the EqlConfiguration instance), without creating an intermediate plan.

.map(FieldAttribute.class::cast)
.map(FieldAttribute::name)
.collect(toList());
assertTrue(projections.contains("@timestamp"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timestamp is not needed no?

// first per KeyedFilter
plan = plan.transformUp(KeyedFilter.class, k -> {
Project p = new Project(projectCtx, k.child(), k.extractionAttributes());
Project p = new Project(projectCtx, k.child(), k.extractionAttributes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k.extractionAttributes returns the timestamp and tiebreaker fields, as well. A sample doesn't need these. Instead of using this method, maybe better to return only the keys.

p
);
// TODO: this could be incorporated into the query generation
LogicalPlan fetchSize = new LimitWithOffset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is not needed for samples. The fetch_size is passed directly (through the EqlConfiguration instance) in the ExecutionManager. Maybe use a simple projection for samples:

                plan = plan.transformUp(KeyedFilter.class, k -> {
                    Project p = new Project(projectCtx, k.child(), k.keys());
                    return new KeyedFilter(k.source(), p, k.keys(), k.timestamp(), k.tiebreaker());
                });

@astefan astefan requested review from costin and bpintea August 4, 2022 15:10
@luigidellaquila luigidellaquila requested a review from astefan August 8, 2022 09:49
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -49,10 +48,16 @@ public LogicalPlan postAnalyze(LogicalPlan plan, EqlConfiguration configuration)
Holder<Boolean> hasJoin = new Holder<>(Boolean.FALSE);

Source projectCtx = synthetic("<implicit-project>");
if (plan.anyMatch(Sequence.class::isInstance)) {
hasJoin.set(Boolean.TRUE);
if (plan.anyMatch(x -> x instanceof Sample)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sample.class::isInstance - use the method reference instead of the lambda expression.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me however the if code can be compacted (per comments) - in its current form it's confusing what happens due to the repetition between the two branches.

Project p = new Project(projectCtx, k.child(), k.keys());
return new KeyedFilter(k.source(), p, k.keys(), k.timestamp(), k.tiebreaker());
});
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new piece of code has different semantics then the previous one - the projection is now applied to all keyed filter not just those under Sequence (including Sample) which is too broad.
Second there's the issue of code duplication since both branches create new objects and do the transformUp - I suggest making them more compact:

 if (plan.anyMatch(Sequence.class::isInstance)) {
    hasJoin.set(Boolean.TRUE);
    final boolean isSample = Sequence.class.isInstance(plan);
    plan = plan.transformUp(KeyedFilter.class, k -> {
          var keys = isSample ? k.keys() : k.extractionAttributes();
          Project p = new Project(projectCtx, k.child(), keys);
          ...
    }

Essentially the if filter down into the plan transformUp method as oppose to where it is right now.

@luigidellaquila
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/part-1

preserving previous logic, so that KeyedFilter is used only for sequences and samples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/EQL EQL querying >enhancement Team:QL (Deprecated) Meta label for query languages team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants