Skip to content

SQL: Fix incorrect parameter resolution #63710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Nov 4, 2020
Merged

Conversation

palesz
Copy link
Contributor

@palesz palesz commented Oct 15, 2020

For example in the query SELECT ? FROM test with a single integer
(value = 100) parameter the ? turns into an Alias (child=Literal(100),name="?") or if we try to print it back into SQL, SELECT 100 as ? FROM test.

A problem arises when multiple ? is in the query with the same type.
For example the SELECT ?, ? FROM test query with integer parameters
(value=100, 200) will turn into SELECT 100 as ?, 200 as ? FROM test.

This commit changes the way aliases are assigned: the first unaliased
parameter will stay ?, but from the 2nd unaliased parameter literal
a counter will be added to the alias (?2, ?3, ...).

Callout: This will introduce a change for the users: a query with
multiple unaliased parameters (but each parameter with different
types) successfully executed before and returned the results with
multiple ? columns (although must have accessed via index from the
result set instead of alias). Now all, but the first column will have a
suffix ( ? -> ?n).

Fix #56013

Parameters in the SQL query marked as `?` turn into an `UnresolvedAlias`
in the parsed tree, aliasing (without a name) the `Literal` representing
the passed in parameter value.

For example in the query `SELECT ? FROM test` with a single integer
(value = 100) parameter the ? turns into an `UnresolvedAlias
(child=Literal(100))` or if we try to print it back into SQL, `SELECT
100 as ? FROM test`.

A problem arises when multiple `?` is in the query, since the `SELECT ?,
? FROM test` query with integer parameters (value=100, 200) will turn
into `SELECT 100 as ?, 200 as ? FROM test`. You guessed it correctly,
name conflict is the underlying issue.

The `UnresolvedAlias`es will be assigned the same name and are turned
into `Alias(name="?")` objects by the `Analyzer`.
The problem ultimately comes down to the fact that `Alias`es in the
`QueryFolder`. The `Alias`es eventually put into an `AttributeMap`,
but not directly. First the collection of `Alias`es gets turned into a
`java.util.LinkedHashMap<Attribute, Expression>` which later get's
turned an `AttributeMap`. The key in the map gets created using `Alias
.toAttribute()`, but the `LinkedHashMap<>` uses the `Attribute.equals()`
 and `Attribute.hashCode()` methods which only take into the account the
`name` field of the attribute vs `Attribute.semanticEquals()` and
`Attribute.semanticHash()` that take into account the `name` and `id`
fields (used by the `AttributeMap`).
The name-based equality check in the `LinkedHashMap` essentially
de-duplicates the `Alias`es based on the name and that is why one of the
attributes cannot be resolved (resolution uses both `name` and `id`).

This fix deprecates the `AttributeMap(Map)` constructor, makes sure that
the `AttributeMap`s are populated from `Stream`s or `Iterable`s
directly, so no deduplication happens based on the names of the
`Attribute`s.

Note: The `SELECT 100 as ?, 200 as ? FROM test` query will result in two
columns named as `?` in the results. Alternatively I could have just
generated different names for the `UnresolvedAlias` objects, but that
would just hide the above mentioned bug in the code.

Fix elastic#56013
@palesz palesz added >bug :Analytics/SQL SQL querying v8.0.0 Team:QL (Deprecated) Meta label for query languages team v7.11.0 v7.10.1 labels Oct 15, 2020
@palesz palesz requested a review from costin October 15, 2020 00:16
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/SQL)

@palesz palesz requested a review from bpintea October 15, 2020 00:17
@palesz
Copy link
Contributor Author

palesz commented Oct 15, 2020

I had a suspicion that this will happen. This change of mine broke the subselects. I guess this was the reason of using vanilla maps as an input of the AttributeMap constructors. Tomorrow I will take another stab.

@palesz palesz marked this pull request as draft October 15, 2020 01:03
Andras Palinkas added 3 commits October 27, 2020 20:54
For example in the query `SELECT ? FROM test` with a single integer
(value = 100) parameter the ? turns into an `UnresolvedAlias
(child=Literal(100))` or if we try to print it back into SQL, `SELECT
100 as ? FROM test`.

A problem arises when multiple `?` is in the query with the same type.
For example the `SELECT ?, ? FROM test` query with integer parameters
(value=100, 200) will turn into `SELECT 100 as ?, 200 as ? FROM test`.

This commit changes the way aliases are assigned: the first unaliased
parameter will stay `?`, but from the 2nd unaliased parameter literal
a counter will be added to the alias (`?2`, `?3`, ...).

Callout: This will introduce a change for the users: a query with
multiple unaliased parameters (but each parameter with different
types) successfully executed before and returned the results with
multiple `?` columns (although must have accessed via index from the
result set instead of alias).

Fix elastic#56013
@palesz palesz marked this pull request as ready for review October 27, 2020 21:47
@costin costin requested a review from astefan October 28, 2020 06:41
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach works but being downstream, in the Analyzer, makes things more complicated.
Instead of passing the parameters all the way to the analyzer, it's easier to assign the proper alias directly in the parser when creating the literal in the first place.
See ExpressionBuilder#visitParamLiteral - the ordinal can be created on the fly or further more inferred into the map since that information is already available.

Also I opt for consistency - if we add indices, it makes sense to do that for ? signs including the first one.
The question remains whether the count should start at 0 or 1 depending on the rest of the jdbc/odbc API conventions.

@palesz palesz requested a review from costin October 28, 2020 22:02
@palesz
Copy link
Contributor Author

palesz commented Oct 28, 2020

Pushed down the changes from the Analyzer into the parser (ExpressionBuilder#visitSelectExpression) based on our discussion.

Note: Indexing of the parameters changed, SELECT ?, ? as x, ? FROM test with parameters (100, 200) will be parsed into SELECT 100 as ?1, ? as x, 200 as ?3 FROM test, so the suffixes represent the position of the original parameter (starting with 1 to mimic the JDBC parameter indexes). This will introduce a breaking change to the client, column names will change from ? to ?<param index> in all unaliased cases.
I don't think there are a lot of critical customer SQL queries out there with unaliased parameters, but still we might brake some data pipelines/reporting that worked before.

Question: Should we push this into master only, or should we also release it in 7.x? I tend to think that we should only keep it in master.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callout: This will introduce a change for the users: a query with
multiple unaliased parameters (but each parameter with different
types) successfully executed before and returned the results with
multiple ? columns (although must have accessed via index from the
result set instead of alias). Now all, but the first column will have a
suffix ( ? -> ?n).

I think aliasing with ? is incorrect and in fact we should alias with the actual value.
I'd check with other DBs to see how they handle this:

SELECT ? is the same as SELECT <param> so I would expect this to return the value aliases as not ?.

@@ -956,8 +956,7 @@ private boolean hasUnresolvedAliases(List<? extends NamedExpression> expressions

private List<NamedExpression> assignAliases(List<? extends NamedExpression> exprs) {
List<NamedExpression> newExpr = new ArrayList<>(exprs.size());
for (int i = 0; i < exprs.size(); i++) {
NamedExpression expr = exprs.get(i);
for (NamedExpression expr : exprs) {
NamedExpression transformed = (NamedExpression) expr.transformUp(ua -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert. The PRs should contain the minimal number of changes needed and not affect the rest of the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, leftover from prev commits.

Comment on lines 781 to 782
if (!ctx.getStart().equals(ctx.getStop())) {
throw new ParsingException(source(ctx), "Single PARAM literal expected");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect style (use == false instead of !).
I'm not sure what this check tries to prevent...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: any reason why this is not part of checkStyle and the precommit checks?
I will remove this check, it was more like a sanity check (during unit & integ test) for me to see if there are any possibility where the start and stop token won't be the same. This should never happen.

throw new ParsingException("Not enough actual parameters {} ", params.size());
}
paramTokens.put(token, params.get(param));
param++;
paramTokens.put(token, new SqlParameter(paramIndex+1, params.get(paramIndex)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect formatting paramIndex + 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: will do, again would be great to add it to checkStyle

@costin
Copy link
Member

costin commented Oct 29, 2020

I think a much easier fix is if in ExpressionBuilder#visitParamLiteral the original source is not preserved and instead it is replaced by the literal value.

Filter filter = (Filter) project.children().get(0);
assertThat(filter.condition(), isA(LessThan.class));
LessThan condition = (LessThan) filter.condition();
assertThat(condition.left(), isA(Literal.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In *QL (and Elasticsearch) codebase, the general style for these assertions is either assertThat(X, instanceof(Y)) (more in ES) or assertTrue(X instanceof Y) (more in *QL). Mockito isA seems to be used in the x-pack Security plugin only. I'd suggest going with a *QL-wide consistent approach and use the assertTrue variant, if it's not too much trouble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or assertEquals(Literal.class, condition.left().getClass()

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costin

I think aliasing with ? is incorrect and in fact we should alias with the actual value.
I'd check with other DBs to see how they handle this:

PostgreSQL supports alias on the ? params. For a query like:
SELECT ? AS x, ? FROM test
executed with JDBC prepared stmt, the returned column names are:
x and ?column? respectively.

Filter filter = (Filter) project.children().get(0);
assertThat(filter.condition(), isA(LessThan.class));
LessThan condition = (LessThan) filter.condition();
assertThat(condition.left(), isA(Literal.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or assertEquals(Literal.class, condition.left().getClass()

@palesz
Copy link
Contributor Author

palesz commented Nov 3, 2020

After a lots of back and forth and discussions here is a summary of the issue and the root cause:

(1) SELECT 100, 100 -> success
(2) SELECT ?, ? (with params: 100, 100) -> success
(3) SELECT 100, 100 FROM test -> Unknown output attribute exception for the second 100
(4) SELECT ?, ? FROM test (params: 100, 100) -> Unknown output attribute exception for the second ?
(5) SELECT field1 as "x", field1 as "x" FROM test -> Unknown output attribute exception for the second "x"

There are two separate issues at play here:

  1. Construction of AttributeMaps keeps only one of the Attributes with the same name even if the ids are different (see the AttributeMapTests in this PR). This should be fixed no matter what, we should not overwrite attributes with one another during the construction of the AttributeMap.
  2. The id on the Aliases is not the same in case the Aliases have the same name and same child

It was considered to simpy fix the second issue by just reassigning the same ids to the Aliases with the same name and child, but it would not solve the unknown output attribute exception (see notes below). This PR covers the fix for the first issue and I'll create a new PR (created, see #64652) for the second issue which is not urgent to fix, rather good to have.


Notes and more context

Reassigning ids would not work in the Analyzer/ReplaceAlias either, as long I need to transform Alias -> Alias (just id change), the node.transform... functions will not change the tree if the transformed object is equal to the original one. Which it will be, since neither the equals or the semanticEquals of the Alias take into account the id field (id field only counts for the Attribute#semanticEquals, not even for the Attribute#equals) --> We can only do this id change where the Literal is wrapped into an Alias and where the Alias is created, so the tree changes after the transform from the node#transform...’s point of view.

Further, think about the following case:

SELECT ?, ?, ? FROM test

with params 100, 100, 200.

This would be translated into:

SELECT 100 AS ?#1, 100 AS ?#2, 200 AS ?#3 FROM test

Once we change the IDs to match on the Aliases with the same literals (only Alias have id, Literal does not), we get:

SELECT 100 AS ?#1, 100 AS ?#1, 200 AS ?#3 FROM test

This will result in three reference attributes that we try to put into a Map and later into an AttributeMap:

?#{r}1 (ReferenceAttribute(name=?, id=1, type=INTEGER) -> 100
?#{r}1 (ReferenceAttribute(name=?, id=1, type=INTEGER) -> 100
?#{r}3 (ReferenceAttribute(name=?, id=3, type=INTEGER) -> 200

But this will exactly lead to the same issue, we will get unknown output attribute because the AttributeMap will look like this:

?#{r}1 (ReferenceAttribute(name=?, id=1, type=INTEGER) -> 200

Which is incorrect. It should be:

?#{r}1 (ReferenceAttribute(name=?, id=1, type=INTEGER) -> 100
?#{r}3 (ReferenceAttribute(name=?, id=3, type=INTEGER) -> 200

Why does this happen? Becase the List<Alias>es get transformed into a simple LinkedHashMap via map.put(alias.toAttribute(), alias.child) and later it is turned into an AttributeMap via the AttributeMap(Map) constructor. But the map.put uses the Attribute#equals which only checks the name, but not the id. The AttributeMap uses the Attribute#semanticEquals which checks the id, but by the time the AttributeMap gets populated the damage is already done, one of the attributes is lost.

The only way to fix this is to additionally to reassigning the ids, we also change the aliasing from 100 as ? to 100 as 100. But this is to workaround the fact that the way we construct the AttributeMap is not correct right now (we overwrite ReferenceAttributes unintentionally.

I would add to this that the following case also fails with unknown output attribute:

SELECT col1 AS "x", col1 AS "x" FROM test

Which won’t be fixed, even if I re-alias the parameter literals to their values (literal as ? -> literal as literal). And I cannot use the normal node transform functions to replace their alias either (see above).

Conclusion: No matter what, I think we should fix the way AttributeMap are constructed. We need to remove the AttributeMap(Map) constructor and use the builder that can call the AttrbuteMap#add internally.

Once we did that, we can also think about whether or not to reuse the same ids.

Comment on lines 192 to 193
Attribute attr = a.toAttribute();
collectRefs.put(attr, a.child());
Copy link
Contributor Author

@palesz palesz Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge these two lines before squash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need the checks to pass before merging so you'll have to do it anyway, better do it sooner rather than later.

@palesz
Copy link
Contributor Author

palesz commented Nov 3, 2020

As a separate PR I will also remove the deprecated AttributeMap(Map) ctor.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +51 to +52
sameConstantsWithLimit
SELECT 3, 3, 5 FROM "test_emp" LIMIT 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work adding a few more variations here:
SELECT 3, 5, 3, SELECT 5, 3, 3, 3 with and without FROM.

Comment on lines 192 to 193
Attribute attr = a.toAttribute();
collectRefs.put(attr, a.child());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need the checks to pass before merging so you'll have to do it anyway, better do it sooner rather than later.

@costin
Copy link
Member

costin commented Nov 3, 2020

Worth creating/linking the issue around SELECT ?, ? and id reuse to this issue.

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thx a lot for this effort!

AttributeMap<Expression> newAttributeMap = mapBuilder.build();

assertTrue(newAttributeMap.containsKey(param1.toAttribute()));
assertTrue(newAttributeMap.containsKey(param2.toAttribute())); // no more unknown attribute exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also check the returned values with get() from the map, can be done in the next PR though.

assertNotNull(collectRefs.get(param2.toAttribute()));
AttributeMap<Expression> attributeMap = new AttributeMap<>(collectRefs);

// validate that all Alias can be e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "~ wrapper collided" or some other comment correction (can be the next PR, obv.)

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@palesz palesz merged commit 620cd16 into elastic:master Nov 4, 2020
palesz pushed a commit to palesz/elasticsearch that referenced this pull request Nov 4, 2020
* SQL: Fix incorrect parameter resolution

Summary of the issue and the root cause:

```
(1) SELECT 100, 100 -> success
(2) SELECT ?, ? (with params: 100, 100) -> success
(3) SELECT 100, 100 FROM test -> Unknown output attribute exception for the second 100
(4) SELECT ?, ? FROM test (params: 100, 100) -> Unknown output attribute exception for the second ?
(5) SELECT field1 as "x", field1 as "x" FROM test -> Unknown output attribute exception for the second "x"
```

There are two separate issues at play here:
1. Construction of `AttributeMap`s keeps only one of the `Attribute`s with the same name even if the `id`s are different (see the `AttributeMapTests` in this PR). This should be fixed no matter what, we should not overwrite attributes with one another during the construction of the `AttributeMap`.
2. The `id` on the `Alias`es is not the same in case the `Alias`es have the same `name` and same `child`

It was considered to simpy fix the second issue by just reassigning the same `id`s to the `Alias`es with the same name and child, but it would not solve the `unknown output attribute exception` (see notes below). This PR covers the fix for the first issue.

Fix elastic#56013
palesz pushed a commit to palesz/elasticsearch that referenced this pull request Nov 4, 2020
* SQL: Fix incorrect parameter resolution

Summary of the issue and the root cause:

```
(1) SELECT 100, 100 -> success
(2) SELECT ?, ? (with params: 100, 100) -> success
(3) SELECT 100, 100 FROM test -> Unknown output attribute exception for the second 100
(4) SELECT ?, ? FROM test (params: 100, 100) -> Unknown output attribute exception for the second ?
(5) SELECT field1 as "x", field1 as "x" FROM test -> Unknown output attribute exception for the second "x"
```

There are two separate issues at play here:
1. Construction of `AttributeMap`s keeps only one of the `Attribute`s with the same name even if the `id`s are different (see the `AttributeMapTests` in this PR). This should be fixed no matter what, we should not overwrite attributes with one another during the construction of the `AttributeMap`.
2. The `id` on the `Alias`es is not the same in case the `Alias`es have the same `name` and same `child`

It was considered to simpy fix the second issue by just reassigning the same `id`s to the `Alias`es with the same name and child, but it would not solve the `unknown output attribute exception` (see notes below). This PR covers the fix for the first issue.

Fix elastic#56013
palesz pushed a commit that referenced this pull request Nov 5, 2020
Summary of the issue and the root cause:

```
(1) SELECT 100, 100 -> success
(2) SELECT ?, ? (with params: 100, 100) -> success
(3) SELECT 100, 100 FROM test -> Unknown output attribute exception for the second 100
(4) SELECT ?, ? FROM test (params: 100, 100) -> Unknown output attribute exception for the second ?
(5) SELECT field1 as "x", field1 as "x" FROM test -> Unknown output attribute exception for the second "x"
```

There are two separate issues at play here:
1. Construction of `AttributeMap`s keeps only one of the `Attribute`s with the same name even if the `id`s are different (see the `AttributeMapTests` in this PR). This should be fixed no matter what, we should not overwrite attributes with one another during the construction of the `AttributeMap`.
2. The `id` on the `Alias`es is not the same in case the `Alias`es have the same `name` and same `child`

It was considered to simpy fix the second issue by just reassigning the same `id`s to the `Alias`es with the same name and child, but it would not solve the `unknown output attribute exception` (see notes below). This PR covers the fix for the first issue.

Relates to #56013
@palesz palesz deleted the fix/56013 branch November 5, 2020 01:43
palesz pushed a commit that referenced this pull request Nov 5, 2020
Summary of the issue and the root cause:

```
(1) SELECT 100, 100 -> success
(2) SELECT ?, ? (with params: 100, 100) -> success
(3) SELECT 100, 100 FROM test -> Unknown output attribute exception for the second 100
(4) SELECT ?, ? FROM test (params: 100, 100) -> Unknown output attribute exception for the second ?
(5) SELECT field1 as "x", field1 as "x" FROM test -> Unknown output attribute exception for the second "x"
```

There are two separate issues at play here:
1. Construction of `AttributeMap`s keeps only one of the `Attribute`s with the same name even if the `id`s are different (see the `AttributeMapTests` in this PR). This should be fixed no matter what, we should not overwrite attributes with one another during the construction of the `AttributeMap`.
2. The `id` on the `Alias`es is not the same in case the `Alias`es have the same `name` and same `child`

It was considered to simpy fix the second issue by just reassigning the same `id`s to the `Alias`es with the same name and child, but it would not solve the `unknown output attribute exception` (see notes below). This PR covers the fix for the first issue.

Relates to #56013
@andreidan andreidan added v7.10.0 and removed v7.10.1 labels Nov 6, 2020
palesz pushed a commit that referenced this pull request Nov 9, 2020
Use the `AttributeMap.builder()` instead of the `AttributeMap(Map)` to
construct immutable `AttributeMap`s.

Clean up after the `ctor` deprecation in #63710
palesz pushed a commit to palesz/elasticsearch that referenced this pull request Nov 9, 2020
Use the `AttributeMap.builder()` instead of the `AttributeMap(Map)` to
construct immutable `AttributeMap`s.

Clean up after the `ctor` deprecation in elastic#63710
palesz pushed a commit that referenced this pull request Nov 10, 2020
* SQL: Remove the deprecated `AttributeMap(Map)` calls (#64664)

Use the `AttributeMap.builder()` instead of the `AttributeMap(Map)` to
construct immutable `AttributeMap`s.

Clean up after the `ctor` deprecation in #63710
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: incorrect column resolution from parameter
8 participants