-
Notifications
You must be signed in to change notification settings - Fork 25.2k
SQL: Fix incorrect parameter resolution #63710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Parameters in the SQL query marked as `?` turn into an `UnresolvedAlias` in the parsed tree, aliasing (without a name) the `Literal` representing the passed in parameter value. For example in the query `SELECT ? FROM test` with a single integer (value = 100) parameter the ? turns into an `UnresolvedAlias (child=Literal(100))` or if we try to print it back into SQL, `SELECT 100 as ? FROM test`. A problem arises when multiple `?` is in the query, since the `SELECT ?, ? FROM test` query with integer parameters (value=100, 200) will turn into `SELECT 100 as ?, 200 as ? FROM test`. You guessed it correctly, name conflict is the underlying issue. The `UnresolvedAlias`es will be assigned the same name and are turned into `Alias(name="?")` objects by the `Analyzer`. The problem ultimately comes down to the fact that `Alias`es in the `QueryFolder`. The `Alias`es eventually put into an `AttributeMap`, but not directly. First the collection of `Alias`es gets turned into a `java.util.LinkedHashMap<Attribute, Expression>` which later get's turned an `AttributeMap`. The key in the map gets created using `Alias .toAttribute()`, but the `LinkedHashMap<>` uses the `Attribute.equals()` and `Attribute.hashCode()` methods which only take into the account the `name` field of the attribute vs `Attribute.semanticEquals()` and `Attribute.semanticHash()` that take into account the `name` and `id` fields (used by the `AttributeMap`). The name-based equality check in the `LinkedHashMap` essentially de-duplicates the `Alias`es based on the name and that is why one of the attributes cannot be resolved (resolution uses both `name` and `id`). This fix deprecates the `AttributeMap(Map)` constructor, makes sure that the `AttributeMap`s are populated from `Stream`s or `Iterable`s directly, so no deduplication happens based on the names of the `Attribute`s. Note: The `SELECT 100 as ?, 200 as ? FROM test` query will result in two columns named as `?` in the results. Alternatively I could have just generated different names for the `UnresolvedAlias` objects, but that would just hide the above mentioned bug in the code. Fix elastic#56013
Pinging @elastic/es-ql (:Query Languages/SQL) |
I had a suspicion that this will happen. This change of mine broke the subselects. I guess this was the reason of using vanilla maps as an input of the |
For example in the query `SELECT ? FROM test` with a single integer (value = 100) parameter the ? turns into an `UnresolvedAlias (child=Literal(100))` or if we try to print it back into SQL, `SELECT 100 as ? FROM test`. A problem arises when multiple `?` is in the query with the same type. For example the `SELECT ?, ? FROM test` query with integer parameters (value=100, 200) will turn into `SELECT 100 as ?, 200 as ? FROM test`. This commit changes the way aliases are assigned: the first unaliased parameter will stay `?`, but from the 2nd unaliased parameter literal a counter will be added to the alias (`?2`, `?3`, ...). Callout: This will introduce a change for the users: a query with multiple unaliased parameters (but each parameter with different types) successfully executed before and returned the results with multiple `?` columns (although must have accessed via index from the result set instead of alias). Fix elastic#56013
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach works but being downstream, in the Analyzer, makes things more complicated.
Instead of passing the parameters all the way to the analyzer, it's easier to assign the proper alias directly in the parser when creating the literal in the first place.
See ExpressionBuilder#visitParamLiteral
- the ordinal can be created on the fly or further more inferred into the map since that information is already available.
Also I opt for consistency - if we add indices, it makes sense to do that for ?
signs including the first one.
The question remains whether the count should start at 0 or 1 depending on the rest of the jdbc/odbc API conventions.
Pushed down the changes from the Note: Indexing of the parameters changed, Question: Should we push this into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Callout: This will introduce a change for the users: a query with
multiple unaliased parameters (but each parameter with different
types) successfully executed before and returned the results with
multiple ? columns (although must have accessed via index from the
result set instead of alias). Now all, but the first column will have a
suffix ( ? -> ?n).
I think aliasing with ?
is incorrect and in fact we should alias with the actual value.
I'd check with other DBs to see how they handle this:
SELECT ?
is the same as SELECT <param>
so I would expect this to return the value aliases as not ?.
@@ -956,8 +956,7 @@ private boolean hasUnresolvedAliases(List<? extends NamedExpression> expressions | |||
|
|||
private List<NamedExpression> assignAliases(List<? extends NamedExpression> exprs) { | |||
List<NamedExpression> newExpr = new ArrayList<>(exprs.size()); | |||
for (int i = 0; i < exprs.size(); i++) { | |||
NamedExpression expr = exprs.get(i); | |||
for (NamedExpression expr : exprs) { | |||
NamedExpression transformed = (NamedExpression) expr.transformUp(ua -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert. The PRs should contain the minimal number of changes needed and not affect the rest of the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, leftover from prev commits.
if (!ctx.getStart().equals(ctx.getStop())) { | ||
throw new ParsingException(source(ctx), "Single PARAM literal expected"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect style (use == false
instead of !
).
I'm not sure what this check tries to prevent...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: any reason why this is not part of checkStyle
and the precommit
checks?
I will remove this check, it was more like a sanity check (during unit & integ test) for me to see if there are any possibility where the start and stop token won't be the same. This should never happen.
throw new ParsingException("Not enough actual parameters {} ", params.size()); | ||
} | ||
paramTokens.put(token, params.get(param)); | ||
param++; | ||
paramTokens.put(token, new SqlParameter(paramIndex+1, params.get(paramIndex))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incorrect formatting paramIndex + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: will do, again would be great to add it to checkStyle
I think a much easier fix is if in |
Filter filter = (Filter) project.children().get(0); | ||
assertThat(filter.condition(), isA(LessThan.class)); | ||
LessThan condition = (LessThan) filter.condition(); | ||
assertThat(condition.left(), isA(Literal.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In *QL (and Elasticsearch) codebase, the general style for these assertions is either assertThat(X, instanceof(Y))
(more in ES) or assertTrue(X instanceof Y)
(more in *QL). Mockito isA
seems to be used in the x-pack Security plugin only. I'd suggest going with a *QL-wide consistent approach and use the assertTrue
variant, if it's not too much trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or assertEquals(Literal.class, condition.left().getClass()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think aliasing with
?
is incorrect and in fact we should alias with the actual value.
I'd check with other DBs to see how they handle this:
PostgreSQL supports alias on the ?
params. For a query like:
SELECT ? AS x, ? FROM test
executed with JDBC prepared stmt, the returned column names are:
x
and ?column?
respectively.
Filter filter = (Filter) project.children().get(0); | ||
assertThat(filter.condition(), isA(LessThan.class)); | ||
LessThan condition = (LessThan) filter.condition(); | ||
assertThat(condition.left(), isA(Literal.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or assertEquals(Literal.class, condition.left().getClass()
After a lots of back and forth and discussions here is a summary of the issue and the root cause:
There are two separate issues at play here:
It was considered to simpy fix the second issue by just reassigning the same Notes and more contextReassigning Further, think about the following case: SELECT ?, ?, ? FROM test with params This would be translated into: SELECT 100 AS ?#1, 100 AS ?#2, 200 AS ?#3 FROM test Once we change the IDs to match on the SELECT 100 AS ?#1, 100 AS ?#1, 200 AS ?#3 FROM test This will result in three reference attributes that we try to put into a
But this will exactly lead to the same issue, we will get
Which is incorrect. It should be:
Why does this happen? Becase the The only way to fix this is to additionally to reassigning the I would add to this that the following case also fails with
Which won’t be fixed, even if I re-alias the parameter literals to their values ( Conclusion: No matter what, I think we should fix the way Once we did that, we can also think about whether or not to reuse the same |
Attribute attr = a.toAttribute(); | ||
collectRefs.put(attr, a.child()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge these two lines before squash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need the checks to pass before merging so you'll have to do it anyway, better do it sooner rather than later.
As a separate PR I will also remove the deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sameConstantsWithLimit | ||
SELECT 3, 3, 5 FROM "test_emp" LIMIT 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Work adding a few more variations here:
SELECT 3, 5, 3
, SELECT 5, 3, 3, 3
with and without FROM.
Attribute attr = a.toAttribute(); | ||
collectRefs.put(attr, a.child()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need the checks to pass before merging so you'll have to do it anyway, better do it sooner rather than later.
Worth creating/linking the issue around |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thx a lot for this effort!
AttributeMap<Expression> newAttributeMap = mapBuilder.build(); | ||
|
||
assertTrue(newAttributeMap.containsKey(param1.toAttribute())); | ||
assertTrue(newAttributeMap.containsKey(param2.toAttribute())); // no more unknown attribute exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also check the returned values with get() from the map, can be done in the next PR though.
assertNotNull(collectRefs.get(param2.toAttribute())); | ||
AttributeMap<Expression> attributeMap = new AttributeMap<>(collectRefs); | ||
|
||
// validate that all Alias can be e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "~ wrapper collided" or some other comment correction (can be the next PR, obv.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* SQL: Fix incorrect parameter resolution Summary of the issue and the root cause: ``` (1) SELECT 100, 100 -> success (2) SELECT ?, ? (with params: 100, 100) -> success (3) SELECT 100, 100 FROM test -> Unknown output attribute exception for the second 100 (4) SELECT ?, ? FROM test (params: 100, 100) -> Unknown output attribute exception for the second ? (5) SELECT field1 as "x", field1 as "x" FROM test -> Unknown output attribute exception for the second "x" ``` There are two separate issues at play here: 1. Construction of `AttributeMap`s keeps only one of the `Attribute`s with the same name even if the `id`s are different (see the `AttributeMapTests` in this PR). This should be fixed no matter what, we should not overwrite attributes with one another during the construction of the `AttributeMap`. 2. The `id` on the `Alias`es is not the same in case the `Alias`es have the same `name` and same `child` It was considered to simpy fix the second issue by just reassigning the same `id`s to the `Alias`es with the same name and child, but it would not solve the `unknown output attribute exception` (see notes below). This PR covers the fix for the first issue. Fix elastic#56013
* SQL: Fix incorrect parameter resolution Summary of the issue and the root cause: ``` (1) SELECT 100, 100 -> success (2) SELECT ?, ? (with params: 100, 100) -> success (3) SELECT 100, 100 FROM test -> Unknown output attribute exception for the second 100 (4) SELECT ?, ? FROM test (params: 100, 100) -> Unknown output attribute exception for the second ? (5) SELECT field1 as "x", field1 as "x" FROM test -> Unknown output attribute exception for the second "x" ``` There are two separate issues at play here: 1. Construction of `AttributeMap`s keeps only one of the `Attribute`s with the same name even if the `id`s are different (see the `AttributeMapTests` in this PR). This should be fixed no matter what, we should not overwrite attributes with one another during the construction of the `AttributeMap`. 2. The `id` on the `Alias`es is not the same in case the `Alias`es have the same `name` and same `child` It was considered to simpy fix the second issue by just reassigning the same `id`s to the `Alias`es with the same name and child, but it would not solve the `unknown output attribute exception` (see notes below). This PR covers the fix for the first issue. Fix elastic#56013
Summary of the issue and the root cause: ``` (1) SELECT 100, 100 -> success (2) SELECT ?, ? (with params: 100, 100) -> success (3) SELECT 100, 100 FROM test -> Unknown output attribute exception for the second 100 (4) SELECT ?, ? FROM test (params: 100, 100) -> Unknown output attribute exception for the second ? (5) SELECT field1 as "x", field1 as "x" FROM test -> Unknown output attribute exception for the second "x" ``` There are two separate issues at play here: 1. Construction of `AttributeMap`s keeps only one of the `Attribute`s with the same name even if the `id`s are different (see the `AttributeMapTests` in this PR). This should be fixed no matter what, we should not overwrite attributes with one another during the construction of the `AttributeMap`. 2. The `id` on the `Alias`es is not the same in case the `Alias`es have the same `name` and same `child` It was considered to simpy fix the second issue by just reassigning the same `id`s to the `Alias`es with the same name and child, but it would not solve the `unknown output attribute exception` (see notes below). This PR covers the fix for the first issue. Relates to #56013
Summary of the issue and the root cause: ``` (1) SELECT 100, 100 -> success (2) SELECT ?, ? (with params: 100, 100) -> success (3) SELECT 100, 100 FROM test -> Unknown output attribute exception for the second 100 (4) SELECT ?, ? FROM test (params: 100, 100) -> Unknown output attribute exception for the second ? (5) SELECT field1 as "x", field1 as "x" FROM test -> Unknown output attribute exception for the second "x" ``` There are two separate issues at play here: 1. Construction of `AttributeMap`s keeps only one of the `Attribute`s with the same name even if the `id`s are different (see the `AttributeMapTests` in this PR). This should be fixed no matter what, we should not overwrite attributes with one another during the construction of the `AttributeMap`. 2. The `id` on the `Alias`es is not the same in case the `Alias`es have the same `name` and same `child` It was considered to simpy fix the second issue by just reassigning the same `id`s to the `Alias`es with the same name and child, but it would not solve the `unknown output attribute exception` (see notes below). This PR covers the fix for the first issue. Relates to #56013
Use the `AttributeMap.builder()` instead of the `AttributeMap(Map)` to construct immutable `AttributeMap`s. Clean up after the `ctor` deprecation in #63710
Use the `AttributeMap.builder()` instead of the `AttributeMap(Map)` to construct immutable `AttributeMap`s. Clean up after the `ctor` deprecation in elastic#63710
For example in the query
SELECT ? FROM test
with a single integer(value = 100) parameter the ? turns into an
Alias (child=Literal(100),name="?")
or if we try to print it back into SQL,SELECT 100 as ? FROM test
.A problem arises when multiple
?
is in the query with the same type.For example the
SELECT ?, ? FROM test
query with integer parameters(value=100, 200) will turn into
SELECT 100 as ?, 200 as ? FROM test
.This commit changes the way aliases are assigned: the first unaliased
parameter will stay
?
, but from the 2nd unaliased parameter literala counter will be added to the alias (
?2
,?3
, ...).Callout: This will introduce a change for the users: a query with
multiple unaliased parameters (but each parameter with different
types) successfully executed before and returned the results with
multiple
?
columns (although must have accessed via index from theresult set instead of alias). Now all, but the first column will have a
suffix (
?
->?n
).Fix #56013