Skip to content

Commit 2e9f31e

Browse files
Limit the analyzed text for highlighting (elastic#27934)
- Introduce index level setting "index.highlight.max_analyzed_offset" to control the max number of character to be analyzed for highlighting - Make this setting to be unset by default (equal to -1) - Issue a deprecation warning if setting is unset and analysis is required on a text larger than ES v.7.x max setting (10000) - Throw IllegalArgumentException is setting is set by a user, and analysis is required on a text larger than the user's set value. Closes elastic#27517 Adding validator for index.highlight.max_analyzed_offset setting
1 parent 9b9df00 commit 2e9f31e

File tree

8 files changed

+54
-31
lines changed

8 files changed

+54
-31
lines changed

docs/reference/index-modules.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ specific index module:
186186

187187
The maximum number of characters that will be analyzed for a highlight request.
188188
This setting is only applicable when highlighting is requested on a text that was indexed without offsets or term vectors.
189-
Defaults to `10000`.
189+
By default this settings is unset in 6.x, defaults to `-1`.
190190

191191
`index.max_terms_count`::
192192

docs/reference/migration/migrate_6_0/analysis.asciidoc

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ created in 5.x.
1616
Highlighting a text that was indexed without offsets or term vectors,
1717
requires analysis of this text in memory real time during the search request.
1818
For large texts this analysis may take substantial amount of time and memory.
19-
To protect against this, the maximum number of characters that will be analyzed has been
20-
limited to 10000. This default limit can be changed
21-
for a particular index with the index setting `index.highlight.max_analyzed_offset`.
19+
To protect against this, the maximum number of characters that to be analyzed will be
20+
limited to 10000 in the next major Elastic version. For this version, by default the limit
21+
is not set. A deprecation warning will be issued when an analyzed text exceeds 10000.
22+
The limit can be set for a particular index with the index setting
23+
`index.highlight.max_analyzed_offset`.

docs/reference/search/request/highlighting.asciidoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -106,9 +106,9 @@ needs highlighting. The `plain` highlighter always uses plain highlighting.
106106

107107
[WARNING]
108108
Plain highlighting for large texts may require substantial amount of time and memory.
109-
To protect against this, the maximum number of text characters that will be analyzed has been
110-
limited to 10000. This default limit can be changed
111-
for a particular index with the index setting `index.highlight.max_analyzed_offset`.
109+
To protect against this, the maximum number of text characters to be analyzed will be
110+
limited to 10000 in the next major Elastic version. The default limit is not set for this version,
111+
but can be set for a particular index with the index setting `index.highlight.max_analyzed_offset`.
112112

113113
[[highlighting-settings]]
114114
==== Highlighting Settings

rest-api-spec/src/main/resources/rest-api-spec/test/search.highlight/30_max_analyzed_offset.yml

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -33,27 +33,24 @@ setup:
3333
- skip:
3434
version: " - 6.1.99"
3535
reason: index.highlight.max_analyzed_offset setting has been added in 6.2
36-
features: "warnings"
3736
- do:
37+
catch: bad_request
3838
search:
3939
index: test1
4040
body: {"query" : {"match" : {"field1" : "fox"}}, "highlight" : {"type" : "unified", "fields" : {"field1" : {}}}}
41-
warnings:
42-
- Deprecated large text to be analyzed for highlighting! The length has exceeded the allowed maximum of [10]. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors is recommended!
43-
41+
- match: { error.root_cause.0.type: "illegal_argument_exception" }
4442

4543
---
4644
"Plain highlighter on a field WITHOUT OFFSETS exceeding index.highlight.max_analyzed_offset should FAIL":
4745
- skip:
4846
version: " - 6.1.99"
4947
reason: index.highlight.max_analyzed_offset setting has been added in 6.2
50-
features: "warnings"
5148
- do:
49+
catch: bad_request
5250
search:
5351
index: test1
5452
body: {"query" : {"match" : {"field1" : "fox"}}, "highlight" : {"type" : "plain", "fields" : {"field1" : {}}}}
55-
warnings:
56-
- Deprecated large text to be analyzed for highlighting! The length has exceeded the allowed maximum of [10]. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors, and highlighting with unified or fvh highlighter is recommended!
53+
- match: { error.root_cause.0.type: "illegal_argument_exception" }
5754

5855
---
5956
"Unified highlighter on a field WITH OFFSETS exceeding index.highlight.max_analyzed_offset should SUCCEED":
@@ -72,10 +69,9 @@ setup:
7269
- skip:
7370
version: " - 6.1.99"
7471
reason: index.highlight.max_analyzed_offset setting has been added in 6.2
75-
features: "warnings"
7672
- do:
73+
catch: bad_request
7774
search:
7875
index: test1
7976
body: {"query" : {"match" : {"field2" : "fox"}}, "highlight" : {"type" : "plain", "fields" : {"field2" : {}}}}
80-
warnings:
81-
- Deprecated large text to be analyzed for highlighting! The length has exceeded the allowed maximum of [10]. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors, and highlighting with unified or fvh highlighter is recommended!
77+
- match: { error.root_cause.0.type: "illegal_argument_exception" }

server/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -128,11 +128,21 @@ public Snippet[] highlightField(String field, Query query, int docId, int maxPas
128128
@Override
129129
protected List<CharSequence[]> loadFieldValues(String[] fields, DocIdSetIterator docIter,
130130
int cacheCharsThreshold) throws IOException {
131-
if ((offsetSource == OffsetSource.ANALYSIS) && (fieldValue.length() > maxAnalyzedOffset)) {
131+
// Issue deprecation warning if maxAnalyzedOffset is not set, and field length > default setting for 7.0
132+
final int defaultMaxAnalyzedOffset7 = 10000;
133+
if ((offsetSource == OffsetSource.ANALYSIS) && (maxAnalyzedOffset == -1) && (fieldValue.length() > defaultMaxAnalyzedOffset7)) {
132134
DeprecationLogger deprecationLogger = new DeprecationLogger(Loggers.getLogger(CustomUnifiedHighlighter.class));
133135
deprecationLogger.deprecated(
134-
"Deprecated large text to be analyzed for highlighting! The length has exceeded the allowed maximum of [" +
135-
maxAnalyzedOffset + "]. " + "This maximum can be set by changing the [" +
136+
"The length of text to be analyzed for highlighting [" + fieldValue.length() +
137+
"] exceeded the allowed maximum of [" + defaultMaxAnalyzedOffset7 + "] set for the next major Elastic version. " +
138+
"For large texts, indexing with offsets or term vectors is recommended!");
139+
}
140+
// Throw an error if maxAnalyzedOffset is explicitly set by the user, and field length > maxAnalyzedOffset
141+
if ((offsetSource == OffsetSource.ANALYSIS) && (maxAnalyzedOffset > 0) && (fieldValue.length() > maxAnalyzedOffset)) {
142+
// maxAnalyzedOffset is not set by user
143+
throw new IllegalArgumentException(
144+
"The length of text to be analyzed for highlighting [" + fieldValue.length() +
145+
"] exceeded the allowed maximum of [" + maxAnalyzedOffset + "]. This maximum can be set by changing the [" +
136146
IndexSettings.MAX_ANALYZED_OFFSET_SETTING.getKey() + "] index level setting. " +
137147
"For large texts, indexing with offsets or term vectors is recommended!");
138148
}

server/src/main/java/org/elasticsearch/index/IndexSettings.java

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -126,11 +126,11 @@ public final class IndexSettings {
126126
* A setting describing the maximum number of characters that will be analyzed for a highlight request.
127127
* This setting is only applicable when highlighting is requested on a text that was indexed without
128128
* offsets or term vectors.
129-
* The default maximum of 10000 characters is defensive as for highlighting larger texts,
130-
* indexing with offsets or term vectors is recommended.
129+
* This setting is defensive as for highlighting larger texts, indexing with offsets or term vectors is recommended.
130+
* For 6.x the default value is not set or equals to -1.
131131
*/
132132
public static final Setting<Integer> MAX_ANALYZED_OFFSET_SETTING =
133-
Setting.intSetting("index.highlight.max_analyzed_offset", 10000, 1, Property.Dynamic, Property.IndexScope);
133+
Setting.intSetting("index.highlight.max_analyzed_offset", -1, -1, Property.Dynamic, Property.IndexScope);
134134

135135

136136
/**
@@ -727,7 +727,13 @@ private void setMaxDocvalueFields(int maxDocvalueFields) {
727727
*/
728728
public int getHighlightMaxAnalyzedOffset() { return this.maxAnalyzedOffset; }
729729

730-
private void setHighlightMaxAnalyzedOffset(int maxAnalyzedOffset) { this.maxAnalyzedOffset = maxAnalyzedOffset; }
730+
private void setHighlightMaxAnalyzedOffset(int maxAnalyzedOffset) {
731+
if (maxAnalyzedOffset < 1) {
732+
throw new IllegalArgumentException(
733+
"[" + MAX_ANALYZED_OFFSET_SETTING.getKey() + "] must be >= 1");
734+
}
735+
this.maxAnalyzedOffset = maxAnalyzedOffset;
736+
}
731737

732738
/**
733739
* Returns the maximum number of terms that can be used in a Terms Query request

server/src/main/java/org/elasticsearch/search/fetch/subphase/highlight/PlainHighlighter.java

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -110,17 +110,26 @@ public HighlightField highlight(HighlighterContext highlighterContext) {
110110

111111
try {
112112
textsToHighlight = HighlightUtils.loadFieldValues(field, mapper, context, hitContext);
113-
113+
final int defaultMaxAnalyzedOffset7 = 10000;
114114
for (Object textToHighlight : textsToHighlight) {
115115
String text = convertFieldValue(mapper.fieldType(), textToHighlight);
116-
if (text.length() > maxAnalyzedOffset) {
116+
117+
// Issue deprecation warning if maxAnalyzedOffset is not set, and text length > default setting for 7.0
118+
if ((maxAnalyzedOffset == -1) && (text.length() > defaultMaxAnalyzedOffset7)) {
117119
DeprecationLogger deprecationLogger = new DeprecationLogger(Loggers.getLogger(PlainHighlighter.class));
118120
deprecationLogger.deprecated(
119-
"Deprecated large text to be analyzed for highlighting! The length has exceeded the allowed maximum of [" +
120-
maxAnalyzedOffset + "]. " + "This maximum can be set by changing the [" +
121+
"The length of text to be analyzed for highlighting [" + text.length() + "] exceeded the allowed maximum of [" +
122+
defaultMaxAnalyzedOffset7 + "] set for the next major Elastic version. " +
123+
"For large texts, indexing with offsets or term vectors is recommended!");
124+
}
125+
// Throw an error if maxAnalyzedOffset is explicitly set by the user, and text length > maxAnalyzedOffset
126+
if ((maxAnalyzedOffset > 0) && (text.length() > maxAnalyzedOffset)) {
127+
// maxAnalyzedOffset is not set by user
128+
throw new IllegalArgumentException(
129+
"The length of text to be analyzed for highlighting [" + text.length() +
130+
"] exceeded the allowed maximum of [" + maxAnalyzedOffset + "]. This maximum can be set by changing the [" +
121131
IndexSettings.MAX_ANALYZED_OFFSET_SETTING.getKey() + "] index level setting. " +
122-
"For large texts, indexing with offsets or term vectors, and highlighting with unified or " +
123-
"fvh highlighter is recommended!");
132+
"For large texts, indexing with offsets or term vectors is recommended!");
124133
}
125134

126135
try (TokenStream tokenStream = analyzer.tokenStream(mapper.fieldType().name(), text)) {

server/src/test/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighterTests.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ private void assertHighlightOneDoc(String fieldName, String[] inputs, Analyzer a
7979
String rawValue = Strings.arrayToDelimitedString(inputs, String.valueOf(MULTIVAL_SEP_CHAR));
8080
CustomUnifiedHighlighter highlighter = new CustomUnifiedHighlighter(searcher, analyzer, null,
8181
new CustomPassageFormatter("<b>", "</b>", new DefaultEncoder()), locale,
82-
breakIterator, rawValue, noMatchSize, 10000);
82+
breakIterator, rawValue, noMatchSize, -1);
8383
highlighter.setFieldMatcher((name) -> "text".equals(name));
8484
final Snippet[] snippets =
8585
highlighter.highlightField("text", query, topDocs.scoreDocs[0].doc, expectedPassages.length);

0 commit comments

Comments
 (0)