Skip to content

SQL: Escaped wildcard (*) not accepted in LIKE #63428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/reference/sql/functions/like-rlike.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,9 @@ with the `LIKE` operator:
* The percent sign (%)
* The underscore (_)

The percent sign represents zero, one or multiple characters. The underscore represents a single number or character. These symbols can be
used in combinations.
The percent sign represents zero, one or multiple characters. The underscore represents a single number or character. These symbols can be used in combinations.

NOTE: No other characters have special meaning or act as wildcard. Characters often used as wildcards in other languages (`*` or `?`) are treated as normal characters.

[source, sql]
----
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -270,12 +270,6 @@ public LikePattern visitPattern(PatternContext ctx) {
if (pattern == null) {
throw new ParsingException(source(ctx.value), "Pattern must not be [null]");
}
int pos = pattern.indexOf('*');
if (pos >= 0) {
throw new ParsingException(source(ctx.value),
"Invalid char [*] found in pattern [{}] at position {}; use [%] or [_] instead",
pattern, pos);
}
Comment on lines -273 to -278
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costin Did I miss anything, is there a reason to keep this check here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent folks from using * instead of %. Whether that's still a concern I don't know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have anything in the docs regarding the * and we list only the % and _
* should be interpreted as a normal char for SQL:

postgres=# create table mytable (str varchar(10));
CREATE TABLE

postgres=# insert into mytable values('*test*'), ('as*test*fd');
INSERT 0 2

postgres=# select * from mytable where str like ('%*test*%');
    str
------------
 *test*
 as*test*fd
(2 rows)

postgres=# select * from mytable where str like ('*test*');
  str
--------
 *test*
(1 row)

How about adding a NOTE in the docs to explicitly specify that *, ? and the usual regex chars are interpreted as normal chars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK it is only the MS Access SQL flavour that treats * as a wildcard in a LIKE pattern, so SQL folks should be fairly used to * not being a wildcard. ES customers with less SQL knowledge might not. I think adding a NOTE to the docs does not hurt.

Copy link
Contributor Author

@palesz palesz Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also thinking about removing the restriction on not being able to use * as an ESCAPE character from here. Do you see any good reason to keep that restriction?
As a user * would not be my first choice as an ESCAPE character, but the parser can handle it since * is just a normal character.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On PostgreSQL, MySQL, SQLite any character can be used as ESCAPE (even wildchars) and any character can be escaped, not just wildcards. PostgreSQL treats the '%%' ESCAPE '%' as a non-wildcard % character (just as another escaped character), while MySQL and SQLite treat it as a wildcard (special escaping case). I assume it was intentional to behave differently and only allow escaping of the wildchars.

Copy link
Contributor

@matriv matriv Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, since only % and _ are wildcards for LIKE I would only allow escaping of those, but if we go on with this change it shouldn't be part of the bugfix that would be backported, but for next minor release since it's a minor "breaking" change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree @matriv that allowing escaping any characters should be a separate PR and likely should be in the next minor release only, although technically it is less of a breaking change unless someone depends on the exceptions around escaping non-wildchars. Anyways, this is off topic for this PR.

To clarify: in this current PR I am only removing the special treatment of * and the restriction on using *as an ESCAPE character (wildchars are still not allowed as ESCAPE characters). Does this sound good to you?


char escape = 0;
PatternEscapeContext escapeCtx = ctx.patternEscape();
Expand All @@ -288,7 +282,7 @@ public LikePattern visitPattern(PatternContext ctx) {
} else if (escapeString.length() == 1) {
escape = escapeString.charAt(0);
// these chars already have a meaning
if (escape == '*' || escape == '%' || escape == '_') {
if (escape == '%' || escape == '_') {
throw new ParsingException(source(escapeCtx.escape), "Char [{}] cannot be used for escaping", escape);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the same idea (which I like) of actually mentioning the wildcard chars in the error message itself (for the lazy user who doesn't read the docs and asking "why % char cannot be used for escaping?"), what about adapting this error message to include the reason? Something like Char [%] cannot be used for escaping as it's one of the wildcard chars [%_].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good suggestion. I'll add this cosmetic fix.

}
// lastly validate that escape chars (if present) are followed by special chars
Expand All @@ -303,8 +297,8 @@ public LikePattern visitPattern(PatternContext ctx) {
char next = pattern.charAt(i + 1);
if (next != '%' && next != '_') {
throw new ParsingException(source(ctx.value),
"Pattern [{}] is invalid as escape char [{}] at position {} can only escape wildcard chars; found [{}]",
pattern, escape, i, next);
"Pattern [{}] is invalid as escape char [{}] at position {} can only escape "
+ "wildcard chars [%_]; found [{}]", pattern, escape, i, next);
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@ public class LikeEscapingParsingTests extends ESTestCase {

private final SqlParser parser = new SqlParser();

private static LikePattern patternOfLike(Expression exp) {
assertThat(exp, instanceOf(Like.class));
Like l = (Like) exp;
return l.pattern();
}

private String error(String pattern) {
ParsingException ex = expectThrows(ParsingException.class,
() -> parser.createExpression(format(null, "exp LIKE {}", pattern)));
Expand All @@ -36,9 +42,11 @@ private LikePattern like(String pattern) {
} else {
exp = parser.createExpression(format(null, "exp LIKE '{}'", pattern));
}
assertThat(exp, instanceOf(Like.class));
Like l = (Like) exp;
return l.pattern();
return patternOfLike(exp);
}

private LikePattern like(String pattern, Character escapeChar) {
return patternOfLike(parser.createExpression(format(null, "exp LIKE '{}' ESCAPE '{}'", pattern, escapeChar)));
}

public void testNoEscaping() {
Expand All @@ -55,16 +63,34 @@ public void testEscapingLastChar() {

public void testEscapingWrongChar() {
assertThat(error("'|string' ESCAPE '|'"),
is("line 1:11: Pattern [|string] is invalid as escape char [|] at position 0 can only escape wildcard chars; found [s]"));
is("line 1:11: Pattern [|string] is invalid as escape char [|] at position 0 can only escape "
+ "wildcard chars [%_]; found [s]"));
}

public void testEscapingTheEscapeCharacter() {
assertThat(error("'||string' ESCAPE '|'"),
is("line 1:11: Pattern [||string] is invalid as escape char [|] at position 0 can only escape wildcard chars [%_]; found [|]"));
}

public void testInvalidChar() {
assertThat(error("'%string' ESCAPE '%'"),
is("line 1:28: Char [%] cannot be used for escaping"));
public void testEscapingWildcards() {
assertThat(error("'string' ESCAPE '%'"),
is("line 1:27: Char [%] cannot be used for escaping"));
assertThat(error("'string' ESCAPE '_'"),
is("line 1:27: Char [_] cannot be used for escaping"));
}

public void testCannotUseStar() {
assertThat(error("'|*string' ESCAPE '|'"),
is("line 1:11: Invalid char [*] found in pattern [|*string] at position 1; use [%] or [_] instead"));
public void testCanUseStarWithoutEscaping() {
LikePattern like = like("%string*");
assertThat(like.pattern(), is("%string*"));
assertThat(like.asJavaRegex(), is("^.*string\\*$"));
assertThat(like.asLuceneWildcard(), is("*string\\*"));
}

public void testEscapingWithStar() {
LikePattern like = like("*%%*__string", '*');
assertThat(like.pattern(), is("*%%*__string"));
assertThat(like.asJavaRegex(), is("^%.*_.string$"));
assertThat(like.asLuceneWildcard(), is("%*_?string"));
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ public void testWildcardEscapeLuceneWildcard() {
assertEquals("foo\\*bar*", wildcard("foo*bar%"));
}

public void testStarLiteralWithWildcards() {
assertEquals("\\**\\*?foo\\*\\*?*", wildcard("*%*_foo**_%"));
}

public void testWildcardEscapedWildcard() {
assertEquals("foo\\*bar%", wildcard("foo*bar|%"));
}
Expand Down Expand Up @@ -129,4 +133,4 @@ public void testUnescapeMultipleEscapes() {
assertEquals("foo|_bar|", unescape("foo|||_bar||"));
}

}
}