Skip to content

SQL: Escaped wildcard (*) not accepted in LIKE #63428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 13, 2020
Merged

Conversation

palesz
Copy link
Contributor

@palesz palesz commented Oct 7, 2020

For a query like SELECT name FROM test WHERE name LIKE '%c*' ES SQL generates an error. * is not a special character in a LIKE construct and it's expected to not needing to be escaped, so the previous query should work as is.
In the LIKE pattern any * character was treated as invalid character and the usage of % or _ was suggested instead. But * is a valid, acceptable non-wildcard on the right side of the LIKE operator.

Fix: #55108

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/SQL)

@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Oct 7, 2020
@palesz palesz requested review from astefan and matriv October 7, 2020 16:54
@costin costin added v7.10.1 and removed v7.10.0 labels Oct 7, 2020
Comment on lines -273 to -278
int pos = pattern.indexOf('*');
if (pos >= 0) {
throw new ParsingException(source(ctx.value),
"Invalid char [*] found in pattern [{}] at position {}; use [%] or [_] instead",
pattern, pos);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costin Did I miss anything, is there a reason to keep this check here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent folks from using * instead of %. Whether that's still a concern I don't know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have anything in the docs regarding the * and we list only the % and _
* should be interpreted as a normal char for SQL:

postgres=# create table mytable (str varchar(10));
CREATE TABLE

postgres=# insert into mytable values('*test*'), ('as*test*fd');
INSERT 0 2

postgres=# select * from mytable where str like ('%*test*%');
    str
------------
 *test*
 as*test*fd
(2 rows)

postgres=# select * from mytable where str like ('*test*');
  str
--------
 *test*
(1 row)

How about adding a NOTE in the docs to explicitly specify that *, ? and the usual regex chars are interpreted as normal chars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK it is only the MS Access SQL flavour that treats * as a wildcard in a LIKE pattern, so SQL folks should be fairly used to * not being a wildcard. ES customers with less SQL knowledge might not. I think adding a NOTE to the docs does not hurt.

Copy link
Contributor Author

@palesz palesz Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also thinking about removing the restriction on not being able to use * as an ESCAPE character from here. Do you see any good reason to keep that restriction?
As a user * would not be my first choice as an ESCAPE character, but the parser can handle it since * is just a normal character.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On PostgreSQL, MySQL, SQLite any character can be used as ESCAPE (even wildchars) and any character can be escaped, not just wildcards. PostgreSQL treats the '%%' ESCAPE '%' as a non-wildcard % character (just as another escaped character), while MySQL and SQLite treat it as a wildcard (special escaping case). I assume it was intentional to behave differently and only allow escaping of the wildchars.

Copy link
Contributor

@matriv matriv Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, since only % and _ are wildcards for LIKE I would only allow escaping of those, but if we go on with this change it shouldn't be part of the bugfix that would be backported, but for next minor release since it's a minor "breaking" change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree @matriv that allowing escaping any characters should be a separate PR and likely should be in the next minor release only, although technically it is less of a breaking change unless someone depends on the exceptions around escaping non-wildchars. Anyways, this is off topic for this PR.

To clarify: in this current PR I am only removing the special treatment of * and the restriction on using *as an ESCAPE character (wildchars are still not allowed as ESCAPE characters). Does this sound good to you?

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do use Markdown as in the initial report (even more so when copy-pasting) - it improves readability a lot.

@@ -29,7 +29,7 @@ private String error(String pattern) {
}

private LikePattern like(String pattern) {
Expression exp = null;
Expression exp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I was wondering what is your convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, I don't have a strong preference for explicitly initialising as null but some times it helps for readability. In this case we have a clear if/else statement that assigns a value, so imho is not so important.

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, Maybe add one more test with a more complicated string, something like:
*%*_string*_**%. Left one more comment about the error message.

assertThat(error("'|*string' ESCAPE '|'"),
is("line 1:11: Invalid char [*] found in pattern [|*string] at position 1; use [%] or [_] instead"));
is("line 1:11: Pattern [|*string] is invalid as escape char [|] at position 0 can only escape wildcard chars; found [*]"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe improve the error message by listing the wildcard chars for LIKE: % and _ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, Maybe add one more test with a more complicated string, something like:
*%*_string*_**%. Left one more comment about the error message.

Will add a new test case, but I will add it to the LikeConversionTests, since I think that has less to do with ESCAPEing.

For a query like `SELECT name FROM test WHERE name LIKE ''%c*'` ES SQL
generates an error. `*` is not a special character in a `LIKE` construct
and it's expected to not needing to be escaped, so the previous query
should work as is.
In the LIKE pattern any `*` character was treated as invalid character
and the usage of `%` or `_` was suggested instead. But `*` is a valid,
acceptable non-wildcard on the right side of the `LIKE` operator.

Fix: elastic#55108
@palesz
Copy link
Contributor Author

palesz commented Oct 7, 2020

Sorry for the force-push, I should not have done that.

@@ -33,8 +33,8 @@ with the `LIKE` operator:
* The percent sign (%)
* The underscore (_)

The percent sign represents zero, one or multiple characters. The underscore represents a single number or character. These symbols can be
used in combinations.
No other characters have special meaning or act as wildcard. The percent sign represents zero, one or multiple characters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd add it as a [NOTE] below the existing sentence, so that it stands out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It might be worth providing first the explanation of the above chars, then add a clarification; i.e. reposition the first sentence at the end.

Comment on lines -273 to -278
int pos = pattern.indexOf('*');
if (pos >= 0) {
throw new ParsingException(source(ctx.value),
"Invalid char [*] found in pattern [{}] at position {}; use [%] or [_] instead",
pattern, pos);
}
Copy link
Contributor

@matriv matriv Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, since only % and _ are wildcards for LIKE I would only allow escaping of those, but if we go on with this change it shouldn't be part of the bugfix that would be backported, but for next minor release since it's a minor "breaking" change.

@@ -33,8 +33,8 @@ with the `LIKE` operator:
* The percent sign (%)
* The underscore (_)

The percent sign represents zero, one or multiple characters. The underscore represents a single number or character. These symbols can be
used in combinations.
No other characters have special meaning or act as wildcard. The percent sign represents zero, one or multiple characters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It might be worth providing first the explanation of the above chars, then add a clarification; i.e. reposition the first sentence at the end.

Andras Palinkas added 3 commits October 8, 2020 11:58
Added a NOTE explaining that only `%` and `_` treated as wildcards
and other characters (even `*` or `?`) are just treated as normal
characters in the `LIKE` pattern.
Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thx!

@palesz palesz merged commit 190d9fe into elastic:master Oct 13, 2020
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too late with the review. Sorry about that. Fwiw LGTM, with a minor cosmetic comment. Thanks.

@@ -288,7 +282,7 @@ public LikePattern visitPattern(PatternContext ctx) {
} else if (escapeString.length() == 1) {
escape = escapeString.charAt(0);
// these chars already have a meaning
if (escape == '*' || escape == '%' || escape == '_') {
if (escape == '%' || escape == '_') {
throw new ParsingException(source(escapeCtx.escape), "Char [{}] cannot be used for escaping", escape);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the same idea (which I like) of actually mentioning the wildcard chars in the error message itself (for the lazy user who doesn't read the docs and asking "why % char cannot be used for escaping?"), what about adapting this error message to include the reason? Something like Char [%] cannot be used for escaping as it's one of the wildcard chars [%_].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good suggestion. I'll add this cosmetic fix.

palesz pushed a commit to palesz/elasticsearch that referenced this pull request Oct 13, 2020
Mentions the list of wildchars in case a wildchar is used as an
`ESCAPE` character.

Relates elastic#63428
palesz pushed a commit that referenced this pull request Oct 13, 2020
Mentions the list of wildchars in case a wildchar is used as an
`ESCAPE` character.

Relates #63428
palesz added a commit that referenced this pull request Oct 13, 2020
For a query like `SELECT name FROM test WHERE name LIKE ''%c*'` ES SQL
generates an error. `*` is not a special character in a `LIKE` construct
and it's expected to not needing to be escaped, so the previous query
should work as is.
In the LIKE pattern any `*` character was treated as invalid character
and the usage of `%` or `_` was suggested instead. But `*` is a valid,
acceptable non-wildcard on the right side of the `LIKE` operator.

Fix: #55108
(cherry picked from commit 190d9fe)
palesz pushed a commit that referenced this pull request Oct 13, 2020
Mentions the list of wildchars in case a wildchar is used as an
`ESCAPE` character.

Relates #63428

(cherry picked from commit 74cbcf8)
palesz added a commit that referenced this pull request Oct 13, 2020
For a query like `SELECT name FROM test WHERE name LIKE ''%c*'` ES SQL
generates an error. `*` is not a special character in a `LIKE` construct
and it's expected to not needing to be escaped, so the previous query
should work as is.
In the LIKE pattern any `*` character was treated as invalid character
and the usage of `%` or `_` was suggested instead. But `*` is a valid,
acceptable non-wildcard on the right side of the `LIKE` operator.

Fix: #55108
(cherry picked from commit 190d9fe)
palesz pushed a commit that referenced this pull request Oct 13, 2020
Mentions the list of wildchars in case a wildchar is used as an
`ESCAPE` character.

Relates #63428

(cherry picked from commit 74cbcf8)
@andreidan andreidan added v7.10.0 and removed v7.10.1 labels Oct 15, 2020
palesz added a commit to palesz/elasticsearch that referenced this pull request Oct 19, 2020
For a query like `SELECT name FROM test WHERE name LIKE ''%c*'` ES SQL
generates an error. `*` is not a special character in a `LIKE` construct
and it's expected to not needing to be escaped, so the previous query
should work as is.
In the LIKE pattern any `*` character was treated as invalid character
and the usage of `%` or `_` was suggested instead. But `*` is a valid,
acceptable non-wildcard on the right side of the `LIKE` operator.

Fix: elastic#55108
palesz pushed a commit to palesz/elasticsearch that referenced this pull request Oct 19, 2020
Mentions the list of wildchars in case a wildchar is used as an
`ESCAPE` character.

Relates elastic#63428
palesz pushed a commit that referenced this pull request Oct 20, 2020
For a query like `SELECT name FROM test WHERE name LIKE ''%c*'` ES SQL
generates an error. `*` is not a special character in a `LIKE` construct
and it's expected to not needing to be escaped, so the previous query
should work as is.
In the LIKE pattern any `*` character was treated as invalid character
and the usage of `%` or `_` was suggested instead. But `*` is a valid,
acceptable non-wildcard on the right side of the `LIKE` operator.

Fix: #55108
Backports #63616 #63428
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: escaped wildcard (*) not accepted in LIKE pattern
8 participants