-
Notifications
You must be signed in to change notification settings - Fork 25.2k
FORK - allow EVAL/DISSECT/STATS in branches #125937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
3a40183
39583e6
f702160
2e638ec
1462449
8ce51f4
68c276a
abce762
f2fa8c0
ea7d8a8
2c1056a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -153,8 +153,9 @@ | |
public class Analyzer extends ParameterizedRuleExecutor<LogicalPlan, AnalyzerContext> { | ||
// marker list of attributes for plans that do not have any concrete fields to return, but have other computed columns to return | ||
// ie from test | stats c = count(*) | ||
public static final String NO_FIELDS_NAME = "<no-fields>"; | ||
public static final List<Attribute> NO_FIELDS = List.of( | ||
new ReferenceAttribute(Source.EMPTY, "<no-fields>", DataType.NULL, Nullability.TRUE, null, true) | ||
new ReferenceAttribute(Source.EMPTY, NO_FIELDS_NAME, DataType.NULL, Nullability.TRUE, null, true) | ||
); | ||
|
||
private static final List<Batch<LogicalPlan>> RULES = List.of( | ||
|
@@ -499,6 +500,10 @@ protected LogicalPlan rule(LogicalPlan plan, AnalyzerContext context) { | |
return resolveKeep(p, childrenOutput); | ||
} | ||
|
||
if (plan instanceof Fork f) { | ||
return resolveFork(f, context); | ||
} | ||
|
||
if (plan instanceof Eval p) { | ||
return resolveEval(p, childrenOutput); | ||
} | ||
|
@@ -714,6 +719,62 @@ private Join resolveLookupJoin(LookupJoin join) { | |
return join; | ||
} | ||
|
||
private LogicalPlan resolveFork(Fork fork, AnalyzerContext context) { | ||
// we align the outputs of the sub plans such that they have the same columns | ||
boolean changed = false; | ||
List<LogicalPlan> newSubPlans = new ArrayList<>(); | ||
Set<String> forkColumns = fork.outputSet().names(); | ||
|
||
for (LogicalPlan logicalPlan : fork.children()) { | ||
Source source = logicalPlan.source(); | ||
|
||
// find the missing columns | ||
List<Attribute> missing = new ArrayList<>(); | ||
Set<String> currentNames = logicalPlan.outputSet().names(); | ||
for (Attribute attr : fork.outputSet()) { | ||
if (currentNames.contains(attr.name()) == false) { | ||
missing.add(attr); | ||
} | ||
} | ||
|
||
List<Alias> aliases = missing.stream() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I kept this as it is since I felt using |
||
.map(attr -> new Alias(source, attr.name(), Literal.of(attr, null))) | ||
.collect(Collectors.toList()); | ||
; | ||
ioanatia marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// add the missing columns | ||
if (aliases.size() > 0) { | ||
logicalPlan = new Eval(source, logicalPlan, aliases); | ||
changed = true; | ||
} | ||
|
||
List<String> subPlanColumns = logicalPlan.output().stream().map(Attribute::name).collect(Collectors.toList()); | ||
ioanatia marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// We need to add an explicit Keep even if the outputs align | ||
// This is because at the moment the sub plans are executed and optimized separately and the output might change | ||
// during optimizations. Once we add streaming we might not need to add a Keep when the outputs already align. | ||
if (logicalPlan instanceof Keep == false || subPlanColumns.equals(forkColumns) == false) { | ||
ioanatia marked this conversation as resolved.
Show resolved
Hide resolved
|
||
changed = true; | ||
List<Attribute> newOutput = new ArrayList<>(); | ||
for (String attrName : forkColumns) { | ||
for (Attribute subAttr : logicalPlan.output()) { | ||
if (attrName.equals(subAttr.name())) { | ||
newOutput.add(subAttr); | ||
} | ||
} | ||
} | ||
logicalPlan = new Keep(logicalPlan.source(), logicalPlan, newOutput); | ||
} | ||
|
||
newSubPlans.add(logicalPlan); | ||
} | ||
|
||
if (changed == false) { | ||
return fork; | ||
} | ||
|
||
return new Fork(fork.source(), newSubPlans); | ||
ioanatia marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
private LogicalPlan resolveRerank(Rerank rerank, List<Attribute> childrenOutput) { | ||
List<Alias> newFields = new ArrayList<>(); | ||
boolean changed = false; | ||
|
Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use
processingCommand
and exclude the commands that are not supported byFork
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually having a bit of trouble with the grammar.
Even if I use
processingCommand
here, there are some combinations that fail with a parsing exception:this works:
this fails with a parsing exception:
error:
I am able to use WHERE/LIMIT/SORT/DISSECT/EVAL/STATS without issues.
But using commands like MV_EXPAND/KEEP/RENAME/DROP/GROK in FORK branches fails with a parsing errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that ultimately we should be able to effectively remove this list and replace it with
processingCommand
(that was my original intention when I added it), but this PR is a good step forward in that direction. Let's decouple this, as it will need even more extensive and new testing which is better in a subsequent PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GROK
does not throwParsingException
for me if it is added underforkSubQueryProcessingCommand
. I wonder if it is related to the mode of the commands, the commands that can be recognized underFORK
haveEXPRESSION_MODE
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are probably right - I'd like to follow up on the grammar issue separately if that's okay.
if I recall correctly for
GROK
I was hitting a parsing issue when the FORK subbranch contained multiple commands and not justGROK
.