-
Notifications
You must be signed in to change notification settings - Fork 25.2k
EQL: Introduce ~ grammar for case-insensitive functions #67869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
By default, operators and string comparisons are case sensitive. This PR enables support for case insensitive comparisons through the ~ (tilde) symbol which works on functions. For example: process where ~endsWith(file.path, "ExE") will match all file.path that end with "exe", regardless of their case. In the process, refactored the code by introducing a common class and improving code consistency. Fix elastic#67868
Pinging @elastic/es-ql (Team:QL) |
This PR is a doozy - I realize it's big and so was working on it :) |
Is this the right syntax? I thought we were leaning towards |
Hi @rw-access You are right, now that you mention it I recall that was the outcome. The prefix worked it way as I was modifying the queries and due to ease of use, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments.
|
||
return isStringAndExact(pattern, sourceText(), ParamOrdinal.SECOND); | ||
public EndsWith(Source source, Expression input, Expression pattern, boolean caseInsensitive) { | ||
super(source, input, pattern, caseInsensitive); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the fourth parameter here - caseInsensitive
- is in contradiction with the name of the parameter in the superclass' constructor.
@Override | ||
public boolean foldable() { | ||
return input.foldable() && pattern.foldable(); | ||
return new EndsWithFunctionPipe(source(), this, Expressions.pipe(left()), Expressions.pipe(right()), isCaseInsensitive()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is not present in this PR, but EndsWithFunctionPipe
has isCaseSensitive
as a case sensitivity flag. From what I've seen in the PR, you make a case about case-insensitivity. Unless I'm missing something, it would be better if the case sensitivity aspect is consistent throughout the code base (naming wise). Meaning, change isCaseSensitive
to isCaseInsensitive
everywhere in code.
Also, the following classes still have the same caseSensitive
parameter:
StringContains
StringContainsFunctionPipe
BetweenFunctionProcessor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
} | ||
|
||
public Expression right() { | ||
return input; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return input; | |
return right; |
@@ -17,22 +19,29 @@ | |||
|
|||
public static final String NAME = "sbtw"; | |||
|
|||
private final Processor input, left, right, greedy, caseSensitive; | |||
private final Processor input, left, right, greedy; | |||
private final boolean isCaseSensitive; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of this variable is also inconsistent throughout the code from a different angle: here you have isCaseSensitive
while in other places (for example BetweenFunctionPipe
) it's caseInsensitive
(is
is used in some places while not used in others).
...eql/src/main/java/org/elasticsearch/xpack/eql/expression/function/scalar/string/Between.java
Outdated
Show resolved
Hide resolved
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
''' | ||
expected_event_ids = [88, 92] | ||
description = "check built-in string functions" | ||
|
||
[[queries]] | ||
name = "startsWithCaseInsensitive3" | ||
query = ''' | ||
file where opcode==0 and startsWith(file_name, "expLORER.exe") | ||
file where opcode==0 and startsWith~(file_name, "expLORER.exe") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not introduced by this PR, but: is this test bringing anything new to startsWithCaseInsensitive2
? If so, maybe the name should reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous query checks explorer., the second explorer.exe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both startsWithCaseInsensitive2
and startsWithCaseInsensitive3
test mid-word capitals, but I guess the real difference is that one contains the full word and the other doesn't, which was probably the intend to have them separately. All good.
Edit: so much more as it's been updated :-)
@Override | ||
protected Builder builder() { | ||
return super.builder(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the parent method is protected and is not visible outside classes in the same package - that is nobody outside the package in ql
has access to it. By overriding it, it becomes accessible in the eql package as well.
return def(function, builder, names); | ||
} | ||
|
||
private static Boolean defaultSensitivity(Boolean caseInsensitive) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: my implementation-not-yet-read interpretation is that the function returns the default sensitivity (a constant). Smth in the lines of sensitivityOrDefault(...)
might arguably be clearer.
wildcardQuery = "*" + string + "*"; | ||
} else if (f instanceof EndsWith) { | ||
wildcardQuery = "*" + string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if feasible by now (if already mostly used like that in the codebase), but extracting the wildcard character into a constant might make it cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The String is already internalized by the JVM however I've extracted inside the utils class since we use it there.
The issue with extracting constants is that if they aren't used consistently, they just occupy space.
@@ -33,19 +34,22 @@ | |||
public static final String INTERNAL_EQL_SCRIPT_UTILS = "InternalEqlScriptUtils"; | |||
public static final String INTERNAL_SQL_SCRIPT_UTILS = "InternalSqlScriptUtils"; | |||
|
|||
private Scripts() {} | |||
private static final int PKG_LENGTH = "org.elasticsearch.xpack.".length(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of concerns about deser to/from StreamInput/Output, otherwise only a few nitpicks and minor comments.
TILDE_IDENTIFIER | ||
: LETTER (LETTER | DIGIT | '_')* '~' | ||
; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I expected that the functionName
will be IDENTIFIER '~'?
. Why are function names allowed to start with _
and @
only if the function is not suffixed with ~
?
I guess you don't want to allow ~
for @timestamp
, but what about functions starting with _
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Defining a new token just for function names would overlap with the IDENTIIFER and the grammar ends up applying whoever is declared first due to ambiguity leading to rules not matching and thus broken queries.
public enum EqlFunctionResolution implements FunctionResolutionStrategy { | ||
|
||
/** | ||
* Indicate the function should be insensitive {@code ~startwith()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Indicate the function should be insensitive {@code ~startwith()}, | |
* Indicate the function should be insensitive {@code startwith~()}, |
[[endswith_sensitive.fold.tests]] | ||
expression = 'endsWith~("FooBarBaz", "baz")' | ||
expected = true | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The naming is misleading (sensitive as in sensitivity not case sensitive vs insensitive) due to how the spec file was created - which should be changed.
caseInsensitive = in.readBoolean(); | ||
} else { | ||
in.readNamedWriteable(Processor.class); | ||
caseInsensitive = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before it was caseSensitive = Boolean.TRUE
:
Line 48 in c5c3bee
this.caseSensitive = Literal.TRUE; |
Shouldn't this be:
caseInsensitive = true; | |
caseInsensitive = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, before it was caseSensitive true not its caseInsensitive false. Please see again the main comments made on the issue.
} | ||
|
||
public EndsWithFunctionProcessor(StreamInput in) throws IOException { | ||
input = in.readNamedWriteable(Processor.class); | ||
pattern = in.readNamedWriteable(Processor.class); | ||
isCaseSensitive = in.readBoolean(); | ||
isCaseInsensitive = in.readBoolean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you do a version check here similar to the BetweenFunctionProcessor
? Meaning of the boolean flipped between 7.11 -> 7.12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that was actually a leftover. The processors in EQL have serialization which we use in testing but there's no serialization on the client.
The SQL ones use it for the cursor however we offer no guarantees, as of yet, on allowing scrolling across nodes with different SQL versions (essentially a rolling upgrade).
return new DummyFunction(l); | ||
}, "DUMMY_FUNCTION"); | ||
FunctionRegistry r = new EqlFunctionRegistry(definition); | ||
FunctionDefinition def = r.resolveFunction(ur.name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing: assertEquals(ur.source(), ur.buildResolved(randomConfiguration(), def).source());
assertThat(e.getMessage(), endsWith("expects exactly two arguments")); | ||
} | ||
|
||
public void testTernaryCaseAwareWithOptionalFunction() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case needs an overhaul, it doesn't really test what the name suggests and half of the asserts are never tested.
public StartsWithFunctionProcessor(StreamInput in) throws IOException { | ||
source = in.readNamedWriteable(Processor.class); | ||
pattern = in.readNamedWriteable(Processor.class); | ||
isCaseSensitive = in.readBoolean(); | ||
caseInsensitive = in.readBoolean(); | ||
} | ||
|
||
@Override | ||
public final void writeTo(StreamOutput out) throws IOException { | ||
out.writeNamedWriteable(source); | ||
out.writeNamedWriteable(pattern); | ||
out.writeBoolean(isCaseSensitive); | ||
out.writeBoolean(caseInsensitive); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
@@ -60,13 +60,13 @@ public static Object doProcess(Object source, Object pattern, boolean isCaseSens | |||
throw new QlIllegalArgumentException("A string/char is required; received [{}]", pattern); | |||
} | |||
|
|||
if (isCaseSensitive) { | |||
if (caseInsensitive == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flip the if.
randomStringLiteral(), | ||
randomStringLiteral(), | ||
randomStringLiteral(), | ||
randomStringLiteral(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: why did the greedy param change to randomStringLIteral?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
public enum EqlFunctionResolution implements FunctionResolutionStrategy { | ||
|
||
/** | ||
* Indicate the function should be insensitive {@code ~startwith()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Indicate the function should be insensitive {@code ~startwith()}, | |
* Indicate the function should be insensitive {@code startwith~()}, |
@Override | ||
public boolean foldable() { | ||
return input.foldable() && pattern.foldable(); | ||
return new EndsWithFunctionPipe(source(), this, Expressions.pipe(left()), Expressions.pipe(right()), isCaseInsensitive()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine update branch |
/cc @jrodewig |
By default, operators and string comparisons are case sensitive. This PR enables support for case insensitive comparisons through the ~ (tilde) symbol which works on functions. For example: process where endsWith~(file.path, "ExE") will match all file.path that end with "exe", regardless of their case. In the process, refactored the code by introducing a common class and improving code consistency. Fix #67868 (cherry picked from commit afa4ce0)
By default, operators and string comparisons are case sensitive. This PR
enables support for case insensitive comparisons through the ~ (tilde)
symbol which works on functions.
For example:
process where ~endsWith(file.path, "ExE")
will match all file.path that end with "exe", regardless of their case.
In the process, refactored the code by introducing a common class and
improving code consistency.
Fix #67868