-
Notifications
You must be signed in to change notification settings - Fork 25.2k
SQL: Escaped wildcard (*) not accepted in LIKE #63428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-ql (:Query Languages/SQL) |
int pos = pattern.indexOf('*'); | ||
if (pos >= 0) { | ||
throw new ParsingException(source(ctx.value), | ||
"Invalid char [*] found in pattern [{}] at position {}; use [%] or [_] instead", | ||
pattern, pos); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@costin Did I miss anything, is there a reason to keep this check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To prevent folks from using *
instead of %
. Whether that's still a concern I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have anything in the docs regarding the *
and we list only the %
and _
*
should be interpreted as a normal char for SQL:
postgres=# create table mytable (str varchar(10));
CREATE TABLE
postgres=# insert into mytable values('*test*'), ('as*test*fd');
INSERT 0 2
postgres=# select * from mytable where str like ('%*test*%');
str
------------
*test*
as*test*fd
(2 rows)
postgres=# select * from mytable where str like ('*test*');
str
--------
*test*
(1 row)
How about adding a NOTE in the docs to explicitly specify that *
, ?
and the usual regex chars are interpreted as normal chars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK it is only the MS Access SQL flavour that treats *
as a wildcard in a LIKE
pattern, so SQL folks should be fairly used to *
not being a wildcard. ES customers with less SQL knowledge might not. I think adding a NOTE to the docs does not hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also thinking about removing the restriction on not being able to use *
as an ESCAPE
character from here. Do you see any good reason to keep that restriction?
As a user *
would not be my first choice as an ESCAPE
character, but the parser can handle it since *
is just a normal character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On PostgreSQL, MySQL, SQLite any character can be used as ESCAPE (even wildchars) and any character can be escaped, not just wildcards. PostgreSQL treats the '%%' ESCAPE '%'
as a non-wildcard %
character (just as another escaped character), while MySQL and SQLite treat it as a wildcard (special escaping case). I assume it was intentional to behave differently and only allow escaping of the wildchars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see it, since only %
and _
are wildcards for LIKE
I would only allow escaping of those, but if we go on with this change it shouldn't be part of the bugfix that would be backported, but for next minor release since it's a minor "breaking" change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree @matriv that allowing escaping any characters should be a separate PR and likely should be in the next minor release only, although technically it is less of a breaking change unless someone depends on the exceptions around escaping non-wildchars. Anyways, this is off topic for this PR.
To clarify: in this current PR I am only removing the special treatment of *
and the restriction on using *
as an ESCAPE
character (wildchars are still not allowed as ESCAPE
characters). Does this sound good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do use Markdown as in the initial report (even more so when copy-pasting) - it improves readability a lot.
@@ -29,7 +29,7 @@ private String error(String pattern) { | |||
} | |||
|
|||
private LikePattern like(String pattern) { | |||
Expression exp = null; | |||
Expression exp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. I was wondering what is your convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, I don't have a strong preference for explicitly initialising as null
but some times it helps for readability. In this case we have a clear if/else statement that assigns a value, so imho is not so important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, Maybe add one more test with a more complicated string, something like:
*%*_string*_**%
. Left one more comment about the error message.
assertThat(error("'|*string' ESCAPE '|'"), | ||
is("line 1:11: Invalid char [*] found in pattern [|*string] at position 1; use [%] or [_] instead")); | ||
is("line 1:11: Pattern [|*string] is invalid as escape char [|] at position 0 can only escape wildcard chars; found [*]")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe improve the error message by listing the wildcard chars for LIKE: %
and _
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, Maybe add one more test with a more complicated string, something like:
*%*_string*_**%
. Left one more comment about the error message.
Will add a new test case, but I will add it to the LikeConversionTests, since I think that has less to do with ESCAPE
ing.
For a query like `SELECT name FROM test WHERE name LIKE ''%c*'` ES SQL generates an error. `*` is not a special character in a `LIKE` construct and it's expected to not needing to be escaped, so the previous query should work as is. In the LIKE pattern any `*` character was treated as invalid character and the usage of `%` or `_` was suggested instead. But `*` is a valid, acceptable non-wildcard on the right side of the `LIKE` operator. Fix: elastic#55108
Sorry for the |
@@ -33,8 +33,8 @@ with the `LIKE` operator: | |||
* The percent sign (%) | |||
* The underscore (_) | |||
|
|||
The percent sign represents zero, one or multiple characters. The underscore represents a single number or character. These symbols can be | |||
used in combinations. | |||
No other characters have special meaning or act as wildcard. The percent sign represents zero, one or multiple characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd add it as a [NOTE]
below the existing sentence, so that it stands out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It might be worth providing first the explanation of the above chars, then add a clarification; i.e. reposition the first sentence at the end.
int pos = pattern.indexOf('*'); | ||
if (pos >= 0) { | ||
throw new ParsingException(source(ctx.value), | ||
"Invalid char [*] found in pattern [{}] at position {}; use [%] or [_] instead", | ||
pattern, pos); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see it, since only %
and _
are wildcards for LIKE
I would only allow escaping of those, but if we go on with this change it shouldn't be part of the bugfix that would be backported, but for next minor release since it's a minor "breaking" change.
@@ -33,8 +33,8 @@ with the `LIKE` operator: | |||
* The percent sign (%) | |||
* The underscore (_) | |||
|
|||
The percent sign represents zero, one or multiple characters. The underscore represents a single number or character. These symbols can be | |||
used in combinations. | |||
No other characters have special meaning or act as wildcard. The percent sign represents zero, one or multiple characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It might be worth providing first the explanation of the above chars, then add a clarification; i.e. reposition the first sentence at the end.
Added a NOTE explaining that only `%` and `_` treated as wildcards and other characters (even `*` or `?`) are just treated as normal characters in the `LIKE` pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too late with the review. Sorry about that. Fwiw LGTM, with a minor cosmetic comment. Thanks.
@@ -288,7 +282,7 @@ public LikePattern visitPattern(PatternContext ctx) { | |||
} else if (escapeString.length() == 1) { | |||
escape = escapeString.charAt(0); | |||
// these chars already have a meaning | |||
if (escape == '*' || escape == '%' || escape == '_') { | |||
if (escape == '%' || escape == '_') { | |||
throw new ParsingException(source(escapeCtx.escape), "Char [{}] cannot be used for escaping", escape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the same idea (which I like) of actually mentioning the wildcard chars in the error message itself (for the lazy user who doesn't read the docs and asking "why % char cannot be used for escaping?"), what about adapting this error message to include the reason? Something like Char [%] cannot be used for escaping as it's one of the wildcard chars [%_]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good suggestion. I'll add this cosmetic fix.
Mentions the list of wildchars in case a wildchar is used as an `ESCAPE` character. Relates elastic#63428
For a query like `SELECT name FROM test WHERE name LIKE ''%c*'` ES SQL generates an error. `*` is not a special character in a `LIKE` construct and it's expected to not needing to be escaped, so the previous query should work as is. In the LIKE pattern any `*` character was treated as invalid character and the usage of `%` or `_` was suggested instead. But `*` is a valid, acceptable non-wildcard on the right side of the `LIKE` operator. Fix: #55108 (cherry picked from commit 190d9fe)
For a query like `SELECT name FROM test WHERE name LIKE ''%c*'` ES SQL generates an error. `*` is not a special character in a `LIKE` construct and it's expected to not needing to be escaped, so the previous query should work as is. In the LIKE pattern any `*` character was treated as invalid character and the usage of `%` or `_` was suggested instead. But `*` is a valid, acceptable non-wildcard on the right side of the `LIKE` operator. Fix: #55108 (cherry picked from commit 190d9fe)
For a query like `SELECT name FROM test WHERE name LIKE ''%c*'` ES SQL generates an error. `*` is not a special character in a `LIKE` construct and it's expected to not needing to be escaped, so the previous query should work as is. In the LIKE pattern any `*` character was treated as invalid character and the usage of `%` or `_` was suggested instead. But `*` is a valid, acceptable non-wildcard on the right side of the `LIKE` operator. Fix: elastic#55108
Mentions the list of wildchars in case a wildchar is used as an `ESCAPE` character. Relates elastic#63428
For a query like `SELECT name FROM test WHERE name LIKE ''%c*'` ES SQL generates an error. `*` is not a special character in a `LIKE` construct and it's expected to not needing to be escaped, so the previous query should work as is. In the LIKE pattern any `*` character was treated as invalid character and the usage of `%` or `_` was suggested instead. But `*` is a valid, acceptable non-wildcard on the right side of the `LIKE` operator. Fix: #55108 Backports #63616 #63428
For a query like
SELECT name FROM test WHERE name LIKE '%c*'
ES SQL generates an error.*
is not a special character in aLIKE
construct and it's expected to not needing to be escaped, so the previous query should work as is.In the
LIKE
pattern any*
character was treated as invalid character and the usage of%
or_
was suggested instead. But*
is a valid, acceptable non-wildcard on the right side of theLIKE
operator.Fix: #55108