-
Notifications
You must be signed in to change notification settings - Fork 25.2k
SQL: Add PIVOT support #46489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Add PIVOT support #46489
Conversation
Add initial PIVOT support for transforming a regular table into a statistics table around an arbitrary pivoting column: SELECT * FROM (SELECT languages, country, salary, FROM mp) PIVOT (AVG(salary) FOR countries IN ('NL', 'DE', 'ES', 'RO', 'US')) In the current implementation PIVOT allows only one aggregation however this restriction is likely to be lifted in the future.
Pinging @elastic/es-search |
@elasticmachine run elasticsearch-ci/2 |
@@ -3,7 +3,8 @@ | |||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file shouldn't be here - I'll remove it on the next commit once the feedback lands to avoid an extra build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work overall!
I left some comments and here are a couple of more general ones:
- Should we have some test similar to https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/sql/qa/src/main/java/org/elasticsearch/xpack/sql/qa/cli/FetchSizeTestCase.java#L56
to double check that the paging works for large row numbers - Could you please add some tests in
OptimizerTests
to validate theQueryFolder
&Optimizer
folding changes?
@@ -294,7 +294,7 @@ public void testConstantFolding() { | |||
// check now with an alias | |||
result = new ConstantFolding().rule(new Alias(EMPTY, "a", exp)); | |||
assertEquals("a", Expressions.name(result)); | |||
assertEquals(5, ((Literal) result).value()); | |||
//assertEquals(5, ((Literal) result).value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comment please.
@@ -76,6 +76,9 @@ public static SearchSourceBuilder sourceBuilder(QueryContainer container, QueryB | |||
// set page size | |||
if (size != null) { | |||
int sz = container.limit() > 0 ? Math.min(container.limit(), size) : size; | |||
// now take into account the the minimum page (if set) | |||
int minSize = container.minPageSize(); | |||
sz = minSize > 0 ? (Math.max(sz / minSize, 1) * minSize) : sz; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add some comment which explains this calculation?
; | ||
|
||
|
||
averageWithOneValueAndAlias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the same as: https://github.com/elastic/elasticsearch/pull/46489/files#diff-7cd262e8f087be8a976b37e385749ccdR29
|
||
averageWithScalarOverAggregateAndFoldedValue | ||
schema::status:s|client_ip:s | ||
SELECT status, client_ip FROM logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we include the pivoted value here?
Alias a = (Alias) e; | ||
return a.child().foldable() ? Literal.of(a.name(), a.child()) : a; | ||
} | ||
// if (e instanceof Alias) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be removed?
if (groupingSet == null) { | ||
AttributeSet columnSet = Expressions.references(singletonList(column)); | ||
// grouping can happen only on "primitive" fields, thus exclude multi-fields or nested docs | ||
// the verifier enforces this rule so it does not catches folks by surprise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catches -> catch
UnresolvedAttribute column = new UnresolvedAttribute(source(pivotClause.column), visitQualifiedName(pivotClause.column)); | ||
List<NamedExpression> values = namedValues(pivotClause.aggs); | ||
if (values.size() > 1) { | ||
throw new ParsingException(source(pivotClause.aggs), "PIVOT currently supports only one aggregation, found [{}]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ParsingException
at this point is "late", but on the other hand SqlIllegalArgumentException
doesn't seem appropriate. Should we introduce a new one, something with 'Invalidor
Restricted` in the name?
Not necessary to do as part of this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering there's no typed consumer of these exceptions, the fact that it's a parsing exception vs restricted or whatever we get to call it, is about irrelevant semantics.
Further more if the client sees the message wrapped (inside the UI for example), the exception type is even further minimized.
My point is, the message sent to the user is important, not the exception in which is wrapped.
out.add(value.toAttribute().withDataType(agg.dataType()).withId(id)); | ||
} | ||
} | ||
// for multiple args, concat the function and the value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we do that also for one aggregate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - this follows the Oracle convention which is the only one that supports multiple aggs inside the same pivot. Currently this branch is disabled due to a corner-case optimization that results in missing IDs.
@astefan @matriv I've updated the PR, please take another look at it. Thanks to a number of queries suggested by @astefan I concluded that supporting folding expressions inside values or functions inside columns leads to invalid queries due to the optimizer kicking in and creating different naming expressions which, due to the difference in ID, stops matching. As such I've beefed up the Verifier and to prevent such cases - once the |
} | ||
|
||
public void testPivotWithNull() { | ||
assertEquals("1:85: Null not allowed as a PIVOT value", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@astefan Added test for forbidding null
- it looks like it doesn't work since In
doesn't allow it and it just removes it. Which kinda makes sense (null means the thing is missing).
This could be improved by either handling nulls as a separate filter or, preferably inside In
directly.
@matriv thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When null
is inside the value list of IN
it will always yield NULL
as a result which in the case of IN
used as a filter in a normal WHERE clause means false
and therefore it's eliminated. e.g.:
postgres=# select null in (null);
?column?
----------
(1 row)
If for pivot this case should yield true
It should be treated differently but in my opinion not inside IN
but somewhere else (e.g.: QueryFolder where we have the context information of IN used in a PIVOT query).
} | ||
|
||
public void testPivotWithFunctionInput() { | ||
assertEquals("1:37: No functions allowed (yet); encountered [YEAR(date)]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No functions allowed due to namedExpression issues.
Added a minor update essentially preventing Until then, I've prevented the use of such aggs inside the verifier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the feedback addressing and once again great work!
@@ -152,6 +152,30 @@ SELECT * FROM (SELECT languages, gender, salary FROM test_emp) PIVOT (AVG(salary | |||
null |48396.28571428572|62140.666666666664 | |||
; | |||
|
|||
averageWithTwoValuesAndOrderDesc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AndLimit?
else if (namedExpression.foldable()) { | ||
rawValues.add(Literal.of(namedExpression.name(), namedExpression)); | ||
} | ||
// TOOD: same as above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo -> TODO
@@ -203,18 +203,4 @@ null |31070.0 | |||
3 |26830.0 | |||
4 |24646.0 | |||
5 |23353.0 | |||
; | |||
|
|||
innerAggPivot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove this? Isn't this the case that is supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, I got confused and thought that we'd support this more trivial case.
Add initial PIVOT support for transforming a regular table into a statistics table around an arbitrary pivoting column: SELECT * FROM (SELECT languages, country, salary, FROM mp) PIVOT (AVG(salary) FOR countries IN ('NL', 'DE', 'ES', 'RO', 'US')) In the current implementation PIVOT allows only one aggregation however this restriction is likely to be lifted in the future. Also not all aggregations are working, in particular MatrixStats are not yet supported. (cherry picked from commit d912637)
* Remove the limitation of not being able to use `InnerAggregate` inside PIVOTs (aggregations using extended and matrix stats) * The limitation was introduced as part of the original `PIVOT` implementation in #46489, but after #49693 it could be lifted. * Test that the `PIVOT` results in the same query as the `GROUP BY`. This should hold across all the `AggregateFunction`s we have.
* Remove the limitation of not being able to use `InnerAggregate` inside PIVOTs (aggregations using extended and matrix stats) * The limitation was introduced as part of the original `PIVOT` implementation in elastic#46489, but after elastic#49693 it could be lifted. * Test that the `PIVOT` results in the same query as the `GROUP BY`. This should hold across all the `AggregateFunction`s we have. (cherry-pick 67704b0)
* Remove the limitation of not being able to use `InnerAggregate` inside PIVOTs (aggregations using extended and matrix stats) * The limitation was introduced as part of the original `PIVOT` implementation in #46489, but after #49693 it could be lifted. * Test that the `PIVOT` results in the same query as the `GROUP BY`. This should hold across all the `AggregateFunction`s we have. (cherry-picked from 67704b0)
Add initial PIVOT support for transforming a regular table into a
statistics table around an arbitrary pivoting column:
SELECT * FROM
(SELECT languages, country, salary, FROM mp)
PIVOT (AVG(salary) FOR countries IN ('NL', 'DE', 'ES', 'RO', 'US'))
In the current implementation PIVOT allows only one aggregation however
this restriction is likely to be lifted in the future.