-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add query param to limit highlighting to specified length #67325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
43b082e
to
37978d1
Compare
Add a query parameter `limit_to_max_analyzed_offset` to allow users to limit the highlighting of text fields to the value of the `index.highlight.max_analyzed_offset`, thus preventing from throwing an exception when the length of the text field exceeds the limit. The highlighting still takes place but up to the length set by the setting. Relates to: elastic#52155
37978d1
to
ed824b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Analyzer implementation LGTM, I'll defer to Jim and/or Mark on the API. Does it make sense to have a separate query parameter, rather than doing everything with the index setting? That is, would it be easier to use if we just changed things so that you always get the limiting filter added, and you don't get exceptions any more?
for (int i = 0; i < snippets.length; i++) { | ||
assertEquals(expectedPassages[i], snippets[i].getText()); | ||
} | ||
} finally { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use try-with-resources here I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about it but we need to tries since, something like that: https://gist.github.com/matriv/ca6ca5fe191c8e0b1e49af282ec90669
so I preferred the traditional way, but happy to change if you prefer the try-with-resources.
@@ -206,6 +211,7 @@ public final void writeTo(StreamOutput out) throws IOException { | |||
out.writeMap(options); | |||
} | |||
out.writeOptionalBoolean(requireFieldMatch); | |||
out.writeOptionalBoolean(limitToMaxAnalyzedOffset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll need to do a version check here so that mixed-cluster highlighting works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done it already, thx.
Pinging @elastic/es-search (Team:Search) |
+1 for a separate query parameter. I'd also like to remove that setting but not sure if we need to do it in the same PR. Maybe we can consider this new option independently of the fact that the setting to limit highlighting is not useful ? |
Sorry, I'm a bit confused @jimczi . We should add this query param as is already in this PR, correct? |
Can you describe what's implemented with an example ? What's the relation between the query parameter and the setting ? Do we need both ? My +1 was for a dedicated query parameter that is independent of the setting. |
Taking the perspectives of the various parties involved and what they want:
The existing error message below for exceeding the
It should probably offer an alternative remedy. This is the opportunity for the admin to tell the client app developer to write a better app so we should advertise the new With regards to point 3 - the client app developer might want to tell users when content is truncated. This could be done by testing if the length of the highlighted string is greater than the |
@markharwood I had in mind to have a follow up PR to have this warning/flag regarding the "truncation" of the highlighting, since I agree this is useful to avoid confusion for the user. For "simple" users, I guess they will first try to highlight a large doc, get the error, and then use the flag to highlight up to the limit, so they are aware of what's going on, therefore, of course I need to add the new param to the error message as a suggestion. As you mentioned in point 1. seem we want to have both setting on the index and the per-query param. Maybe, if we have a query param that has a similar name with the setting like: |
Reworded the exception message with the help of @romseygeek, and removed the mention to index offsets and term vectors. |
…69016) Add a `max_analyzed_offset` query parameter to allow users to limit the highlighting of text fields to a value less than or equal to the `index.highlight.max_analyzed_offset`, thus avoiding an exception when the length of the text field exceeds the limit. The highlighting still takes place, but stops at the length defined by the new parameter. Closes: #52155 (cherry picked from commit f9af60b)
Since elastic#69016 is merged versions for serialisation/deserialisation and YAML tests are updated. Follows: elastic#67325
With the newly introduced `max_analyzed_offset` the analyzer of `AnnotatedTextHighlighter` was wrapped twice with the `LimitTokenOffsetAnalyzer` by mistake. Follows: elastic#67325
With the newly introduced `max_analyzed_offset` the analyzer of `AnnotatedTextHighlighter` was wrapped twice with the `LimitTokenOffsetAnalyzer` by mistake. Follows: #67325
Assign a value different that the original when testing mutation of the builder. Follows: elastic#67325
Assign a value different that the original when testing mutation of the builder. Follows: #67325
Add a
max_analyzed_offset
query parameter to allow usersto limit the highlighting of text fields to a value less than or equal to the
index.highlight.max_analyzed_offset
, thus avoiding an exception whenthe length of the text field exceeds the limit. The highlighting still takes place,
but stops at the length defined by the new parameter.
Closes: #52155