Enhance logical plan explicitly projecting join keys #88833

luigidellaquila · 2022-07-27T07:27:58Z

In Sample query execution, intermediary queries do not need to return all the fields from the retrieved documents: only join keys are needed to for the processing.
This PR optimizes this aspect, re-writing the logical plan to only project the join keys, avoiding to create field extractors for not needed fields.

This also avoids that the query folder tries to do field extraction on fields with no exact match (eg. "text" fields), that would result in an error.

avoid to create field extractors for all the fields

elasticsearchmachine · 2022-07-28T07:46:13Z

Pinging @elastic/es-ql (Team:QL)

…t_keys

astefan

This is a bit of a shutgun approach. While it works for sequences and samples, the semantics are a bit different: a sample doesn't need a timestamp and the fetch_size (the LimitWithOffset plan) is passed to the queries in a different way (directly through the EqlConfiguration instance), without creating an intermediate plan.

astefan · 2022-08-04T13:49:52Z

x-pack/plugin/eql/src/test/java/org/elasticsearch/xpack/eql/optimizer/OptimizerTests.java

+                .map(FieldAttribute.class::cast)
+                .map(FieldAttribute::name)
+                .collect(toList());
+            assertTrue(projections.contains("@timestamp"));


@timestamp is not needed no?

astefan · 2022-08-04T13:51:49Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/analysis/PostAnalyzer.java

-                // first per KeyedFilter
-                plan = plan.transformUp(KeyedFilter.class, k -> {
-                    Project p = new Project(projectCtx, k.child(), k.extractionAttributes());
+                Project p = new Project(projectCtx, k.child(), k.extractionAttributes());


k.extractionAttributes returns the timestamp and tiebreaker fields, as well. A sample doesn't need these. Instead of using this method, maybe better to return only the keys.

astefan · 2022-08-04T15:07:24Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/analysis/PostAnalyzer.java

-                        p
-                    );
+                // TODO: this could be incorporated into the query generation
+                LogicalPlan fetchSize = new LimitWithOffset(


Also, this is not needed for samples. The fetch_size is passed directly (through the EqlConfiguration instance) in the ExecutionManager. Maybe use a simple projection for samples:

plan = plan.transformUp(KeyedFilter.class, k -> { Project p = new Project(projectCtx, k.child(), k.keys()); return new KeyedFilter(k.source(), p, k.keys(), k.timestamp(), k.tiebreaker()); });

…t_keys

astefan

LGTM

costin · 2022-08-10T01:57:00Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/analysis/PostAnalyzer.java

@@ -49,10 +48,16 @@ public LogicalPlan postAnalyze(LogicalPlan plan, EqlConfiguration configuration)
            Holder<Boolean> hasJoin = new Holder<>(Boolean.FALSE);

            Source projectCtx = synthetic("<implicit-project>");
-            if (plan.anyMatch(Sequence.class::isInstance)) {
-                hasJoin.set(Boolean.TRUE);
+            if (plan.anyMatch(x -> x instanceof Sample)) {


Sample.class::isInstance - use the method reference instead of the lambda expression.

costin

Looks good to me however the if code can be compacted (per comments) - in its current form it's confusing what happens due to the repetition between the two branches.

costin · 2022-08-10T03:14:19Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/analysis/PostAnalyzer.java

+                    Project p = new Project(projectCtx, k.child(), k.keys());
+                    return new KeyedFilter(k.source(), p, k.keys(), k.timestamp(), k.tiebreaker());
+                });
+            } else {


The new piece of code has different semantics then the previous one - the projection is now applied to all keyed filter not just those under Sequence (including Sample) which is too broad.
Second there's the issue of code duplication since both branches create new objects and do the transformUp - I suggest making them more compact:

if (plan.anyMatch(Sequence.class::isInstance)) { hasJoin.set(Boolean.TRUE); final boolean isSample = Sequence.class.isInstance(plan); plan = plan.transformUp(KeyedFilter.class, k -> { var keys = isSample ? k.keys() : k.extractionAttributes(); Project p = new Project(projectCtx, k.child(), keys); ... }

Essentially the if filter down into the plan transformUp method as oppose to where it is right now.

luigidellaquila · 2022-08-10T11:16:31Z

@elasticsearchmachine run elasticsearch-ci/part-1

preserving previous logic, so that KeyedFilter is used only for sequences and samples

Enhance logical plan explicitly projecting join keys

ec6b1cd

avoid to create field extractors for all the fields

luigidellaquila added >enhancement :Analytics/EQL EQL querying labels Jul 28, 2022

luigidellaquila marked this pull request as ready for review July 28, 2022 07:45

luigidellaquila requested a review from astefan July 28, 2022 07:45

elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Jul 28, 2022

Merge branch 'feature/eql_samples' into enhancement/eql_sample_projec…

42038ec

…t_keys

luigidellaquila added the v8.5.0 label Jul 28, 2022

astefan requested changes Aug 4, 2022

View reviewed changes

astefan requested review from costin and bpintea August 4, 2022 15:10

luigidellaquila added 2 commits August 8, 2022 09:33

Merge branch 'feature/eql_samples' into enhancement/eql_sample_projec…

c9429c0

…t_keys

Implement review suggestions

8ab49f2

luigidellaquila requested a review from astefan August 8, 2022 09:49

astefan approved these changes Aug 8, 2022

View reviewed changes

bpintea approved these changes Aug 9, 2022

View reviewed changes

costin reviewed Aug 10, 2022

View reviewed changes

costin approved these changes Aug 10, 2022

View reviewed changes

Implement review suggestions

cc00d2f

Code cleanup and refactoring

35e6057

preserving previous logic, so that KeyedFilter is used only for sequences and samples

luigidellaquila merged commit 156ed98 into elastic:feature/eql_samples Aug 11, 2022

luigidellaquila mentioned this pull request Sep 6, 2022

Intermediary queries in sequences and samples should not ask for all fields to be returned #87174

Closed

luigidellaquila mentioned this pull request Nov 10, 2022

EQL samples #91312

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance logical plan explicitly projecting join keys #88833

Enhance logical plan explicitly projecting join keys #88833

Uh oh!

luigidellaquila commented Jul 27, 2022 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jul 28, 2022

Uh oh!

astefan left a comment

Uh oh!

astefan Aug 4, 2022

Uh oh!

astefan Aug 4, 2022

Uh oh!

astefan Aug 4, 2022

Uh oh!

astefan left a comment

Uh oh!

costin Aug 10, 2022

Uh oh!

costin left a comment

Uh oh!

costin Aug 10, 2022

Uh oh!

luigidellaquila commented Aug 10, 2022

Uh oh!

Uh oh!

Enhance logical plan explicitly projecting join keys #88833

Enhance logical plan explicitly projecting join keys #88833

Uh oh!

Conversation

luigidellaquila commented Jul 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 28, 2022

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

astefan Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

astefan Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

astefan Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

costin Aug 10, 2022

Choose a reason for hiding this comment

Uh oh!

costin left a comment

Choose a reason for hiding this comment

Uh oh!

costin Aug 10, 2022

Choose a reason for hiding this comment

Uh oh!

luigidellaquila commented Aug 10, 2022

Uh oh!

Uh oh!

luigidellaquila commented Jul 27, 2022 •

edited

Loading