Skip to content

Commit 6bba9fc

Browse files
search as you type fieldmapper (#35600)
Adds the search_as_you_type field type that acts like a text field optimized for as-you-type search completion. It creates a couple subfields that analyze the indexed terms as shingles, against which full terms are queried, and a prefix subfield that analyze terms as the largest shingle size used and edge-ngrams, against which partial terms are queried Adds a match_bool_prefix query type that creates a boolean clause of a term query for each term except the last, for which a boolean clause with a prefix query is created. The match_bool_prefix query is the recommended way of querying a search as you type field, which will boil down to term queries for each shingle of the input text on the appropriate shingle field, and the final (possibly partial) term as a term query on the prefix field. This field type also supports phrase and phrase prefix queries however
1 parent 2a9ee84 commit 6bba9fc

File tree

27 files changed

+5198
-100
lines changed

27 files changed

+5198
-100
lines changed

docs/reference/mapping/types.asciidoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>
5252

5353
<<sparse-vector>>:: Record sparse vectors of float values.
5454

55+
<<search-as-you-type>>:: A text-like field optimized for queries to implement as-you-type completion
5556

5657
[float]
5758
=== Multi-fields
@@ -110,3 +111,5 @@ include::types/rank-features.asciidoc[]
110111
include::types/dense-vector.asciidoc[]
111112

112113
include::types/sparse-vector.asciidoc[]
114+
115+
include::types/search-as-you-type.asciidoc[]
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
[[search-as-you-type]]
2+
=== Search as you type datatype
3+
4+
experimental[]
5+
6+
The `search_as_you_type` field type is a text-like field that is optimized to
7+
provide out-of-the-box support for queries that serve an as-you-type completion
8+
use case. It creates a series of subfields that are analyzed to index terms
9+
that can be efficiently matched by a query that partially matches the entire
10+
indexed text value. Both prefix completion (i.e matching terms starting at the
11+
beginning of the input) and infix completion (i.e. matching terms at any
12+
position within the input) are supported.
13+
14+
When adding a field of this type to a mapping
15+
16+
[source,js]
17+
--------------------------------------------------
18+
PUT my_index
19+
{
20+
"mappings": {
21+
"properties": {
22+
"my_field": {
23+
"type": "search_as_you_type"
24+
}
25+
}
26+
}
27+
}
28+
--------------------------------------------------
29+
// CONSOLE
30+
31+
This creates the following fields
32+
33+
[horizontal]
34+
35+
`my_field`::
36+
37+
Analyzed as configured in the mapping. If an analyzer is not configured,
38+
the default analyzer for the index is used
39+
40+
`my_field._2gram`::
41+
42+
Wraps the analyzer of `my_field` with a shingle token filter of shingle
43+
size 2
44+
45+
`my_field._3gram`::
46+
47+
Wraps the analyzer of `my_field` with a shingle token filter of shingle
48+
size 3
49+
50+
`my_field._index_prefix`::
51+
52+
Wraps the analyzer of `my_field._3gram` with an edge ngram token filter
53+
54+
55+
The size of shingles in subfields can be configured with the `max_shingle_size`
56+
mapping parameter. The default is 3, and valid values for this parameter are
57+
integer values 2 - 4 inclusive. Shingle subfields will be created for each
58+
shingle size from 2 up to and including the `max_shingle_size`. The
59+
`my_field._index_prefix` subfield will always use the analyzer from the shingle
60+
subfield with the `max_shingle_size` when constructing its own analyzer.
61+
62+
Increasing the `max_shingle_size` will improve matches for queries with more
63+
consecutive terms, at the cost of larger index size. The default
64+
`max_shingle_size` should usually be sufficient.
65+
66+
The same input text is indexed into each of these fields automatically, with
67+
their differing analysis chains, when an indexed document has a value for the
68+
root field `my_field`.
69+
70+
[source,js]
71+
--------------------------------------------------
72+
PUT my_index/_doc/1?refresh
73+
{
74+
"my_field": "quick brown fox jump lazy dog"
75+
}
76+
--------------------------------------------------
77+
// CONSOLE
78+
// TEST[continued]
79+
80+
The most efficient way of querying to serve a search-as-you-type use case is
81+
usually a <<query-dsl-multi-match-query,`multi_match`>> query of type
82+
<<query-dsl-match-bool-prefix-query,`bool_prefix`>> that targets the root
83+
`search_as_you_type` field and its shingle subfields. This can match the query
84+
terms in any order, but will score documents higher if they contain the terms
85+
in order in a shingle subfield.
86+
87+
[source,js]
88+
--------------------------------------------------
89+
GET my_index/_search
90+
{
91+
"query": {
92+
"multi_match": {
93+
"query": "brown f",
94+
"type": "bool_prefix",
95+
"fields": [
96+
"my_field",
97+
"my_field._2gram",
98+
"my_field._3gram"
99+
]
100+
}
101+
}
102+
}
103+
--------------------------------------------------
104+
// CONSOLE
105+
// TEST[continued]
106+
107+
[source,js]
108+
--------------------------------------------------
109+
{
110+
"took" : 44,
111+
"timed_out" : false,
112+
"_shards" : {
113+
"total" : 1,
114+
"successful" : 1,
115+
"skipped" : 0,
116+
"failed" : 0
117+
},
118+
"hits" : {
119+
"total" : {
120+
"value" : 1,
121+
"relation" : "eq"
122+
},
123+
"max_score" : 0.8630463,
124+
"hits" : [
125+
{
126+
"_index" : "my_index",
127+
"_type" : "_doc",
128+
"_id" : "1",
129+
"_score" : 0.8630463,
130+
"_source" : {
131+
"my_field" : "quick brown fox jump lazy dog"
132+
}
133+
}
134+
]
135+
}
136+
}
137+
--------------------------------------------------
138+
// TESTRESPONSE[s/"took" : 44/"took" : $body.took/]
139+
// TESTRESPONSE[s/"max_score" : 0.8630463/"max_score" : $body.hits.max_score/]
140+
// TESTRESPONSE[s/"_score" : 0.8630463/"_score" : $body.hits.hits.0._score/]
141+
142+
To search for documents that strictly match the query terms in order, or to
143+
search using other properties of phrase queries, use a
144+
<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix` query>> on the root
145+
field. A <<query-dsl-match-query-phrase,`match_phrase` query>> can also be used
146+
if the last term should be matched exactly, and not as a prefix. Using phrase
147+
queries may be less efficient than using the `match_bool_prefix` query.
148+
149+
[source,js]
150+
--------------------------------------------------
151+
GET my_index/_search
152+
{
153+
"query": {
154+
"match_phrase_prefix": {
155+
"my_field": "brown f"
156+
}
157+
}
158+
}
159+
--------------------------------------------------
160+
// CONSOLE
161+
// TEST[continued]
162+
163+
[[specific-params]]
164+
==== Parameters specific to the `search_as_you_type` field
165+
166+
The following parameters are accepted in a mapping for the `search_as_you_type`
167+
field and are specific to this field type
168+
169+
[horizontal]
170+
171+
`max_shingle_size`::
172+
173+
The largest shingle size to index the input with and create subfields for,
174+
creating one subfield for each shingle size between 2 and
175+
`max_shingle_size`. Accepts integer values between 2 and 4 inclusive. This
176+
option defaults to 3.
177+
178+
179+
[[general-params]]
180+
==== Parameters of the field type as a text field
181+
182+
The following parameters are accepted in a mapping for the `search_as_you_type`
183+
field due to its nature as a text-like field, and behave similarly to their
184+
behavior when configuring a field of the <<text,`text`>> datatype. Unless
185+
otherwise noted, these options configure the root fields subfields in
186+
the same way.
187+
188+
<<analyzer,`analyzer`>>::
189+
190+
The <<analysis,analyzer>> which should be used for
191+
<<mapping-index,`analyzed`>> string fields, both at index-time and at
192+
search-time (unless overridden by the
193+
<<search-analyzer,`search_analyzer`>>). Defaults to the default index
194+
analyzer, or the <<analysis-standard-analyzer,`standard` analyzer>>.
195+
196+
<<mapping-index,`index`>>::
197+
198+
Should the field be searchable? Accepts `true` (default) or `false`.
199+
200+
<<index-options,`index_options`>>::
201+
202+
What information should be stored in the index, for search and highlighting
203+
purposes. Defaults to `positions`.
204+
205+
<<norms,`norms`>>::
206+
207+
Whether field-length should be taken into account when scoring queries.
208+
Accepts `true` or `false`. This option configures the root field
209+
and shingle subfields, where its default is `true`. It does not configure
210+
the prefix subfield, where it it `false`.
211+
212+
<<mapping-store,`store`>>::
213+
214+
Whether the field value should be stored and retrievable separately from
215+
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
216+
(default). This option only configures the root field, and does not
217+
configure any subfields.
218+
219+
<<search-analyzer,`search_analyzer`>>::
220+
221+
The <<analyzer,`analyzer`>> that should be used at search time on
222+
<<mapping-index,`analyzed`>> fields. Defaults to the `analyzer` setting.
223+
224+
<<search-quote-analyzer,`search_quote_analyzer`>>::
225+
226+
The <<analyzer,`analyzer`>> that should be used at search time when a
227+
phrase is encountered. Defaults to the `search_analyzer` setting.
228+
229+
<<similarity,`similarity`>>::
230+
231+
Which scoring algorithm or _similarity_ should be used. Defaults
232+
to `BM25`.
233+
234+
<<term-vector,`term_vector`>>::
235+
236+
Whether term vectors should be stored for an <<mapping-index,`analyzed`>>
237+
field. Defaults to `no`. This option configures the root field and shingle
238+
subfields, but not the prefix subfield.
239+
240+
241+
[[prefix-queries]]
242+
==== Optimization of prefix queries
243+
244+
When making a <<query-dsl-prefix-query,`prefix`>> query to the root field or
245+
any of its subfields, the query will be rewritten to a
246+
<<query-dsl-term-query,`term`>> query on the `._index_prefix` subfield. This
247+
matches more efficiently than is typical of `prefix` queries on text fields,
248+
as prefixes up to a certain length of each shingle are indexed directly as
249+
terms in the `._index_prefix` subfield.
250+
251+
The analyzer of the `._index_prefix` subfield slightly modifies the
252+
shingle-building behavior to also index prefixes of the terms at the end of the
253+
field's value that normally would not be produced as shingles. For example, if
254+
the value `quick brown fox` is indexed into a `search_as_you_type` field with
255+
`max_shingle_size` of 3, prefixes for `brown fox` and `fox` are also indexed
256+
into the `._index_prefix` subfield even though they do not appear as terms in
257+
the `._3gram` subfield. This allows for completion of all the terms in the
258+
field's input.

docs/reference/query-dsl/full-text-queries.asciidoc

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,12 @@ The queries in this group are:
1818

1919
<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix` query>>::
2020

21-
The poor man's _search-as-you-type_. Like the `match_phrase` query, but does a wildcard search on the final word.
21+
Like the `match_phrase` query, but does a wildcard search on the final word.
22+
23+
<<query-dsl-match-bool-prefix-query,`match_bool_prefix` query>>::
24+
25+
Creates a `bool` query that matches each term as a `term` query, except for
26+
the last term, which is matched as a `prefix` query
2227

2328
<<query-dsl-multi-match-query,`multi_match` query>>::
2429

@@ -50,6 +55,8 @@ include::match-phrase-query.asciidoc[]
5055

5156
include::match-phrase-prefix-query.asciidoc[]
5257

58+
include::match-bool-prefix-query.asciidoc[]
59+
5360
include::multi-match-query.asciidoc[]
5461

5562
include::common-terms-query.asciidoc[]
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
[[query-dsl-match-bool-prefix-query]]
2+
=== Match Bool Prefix Query
3+
4+
A `match_bool_prefix` query analyzes its input and constructs a
5+
<<query-dsl-bool-query,`bool` query>> from the terms. Each term except the last
6+
is used in a `term` query. The last term is used in a `prefix` query. A
7+
`match_bool_prefix` query such as
8+
9+
[source,js]
10+
--------------------------------------------------
11+
GET /_search
12+
{
13+
"query": {
14+
"match_bool_prefix" : {
15+
"message" : "quick brown f"
16+
}
17+
}
18+
}
19+
--------------------------------------------------
20+
// CONSOLE
21+
22+
where analysis produces the terms `quick`, `brown`, and `f` is similar to the
23+
following `bool` query
24+
25+
[source,js]
26+
--------------------------------------------------
27+
GET /_search
28+
{
29+
"query": {
30+
"bool" : {
31+
"should": [
32+
{ "term": { "message": "quick" }},
33+
{ "term": { "message": "brown" }},
34+
{ "prefix": { "message": "f"}}
35+
]
36+
}
37+
}
38+
}
39+
--------------------------------------------------
40+
// CONSOLE
41+
42+
An important difference between the `match_bool_prefix` query and
43+
<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix`>> is that the
44+
`match_phrase_prefix` query matches its terms as a phrase, but the
45+
`match_bool_prefix` query can match its terms in any position. The example
46+
`match_bool_prefix` query above could match a field containing containing
47+
`quick brown fox`, but it could also match `brown fox quick`. It could also
48+
match a field containing the term `quick`, the term `brown` and a term
49+
starting with `f`, appearing in any position.
50+
51+
==== Parameters
52+
53+
By default, `match_bool_prefix` queries' input text will be analyzed using the
54+
analyzer from the queried field's mapping. A different search analyzer can be
55+
configured with the `analyzer` parameter
56+
57+
[source,js]
58+
--------------------------------------------------
59+
GET /_search
60+
{
61+
"query": {
62+
"match_bool_prefix" : {
63+
"message": {
64+
"query": "quick brown f",
65+
"analyzer": "keyword"
66+
}
67+
}
68+
}
69+
}
70+
--------------------------------------------------
71+
// CONSOLE
72+
73+
`match_bool_prefix` queries support the
74+
<<query-dsl-minimum-should-match,`minimum_should_match`>> and `operator`
75+
parameters as described for the
76+
<<query-dsl-match-query-boolean,`match` query>>, applying the setting to the
77+
constructed `bool` query. The number of clauses in the constructed `bool`
78+
query will in most cases be the number of terms produced by analysis of the
79+
query text.
80+
81+
The <<query-dsl-match-query-fuzziness,`fuzziness`>>, `prefix_length`,
82+
`max_expansions`, `fuzzy_transpositions`, and `fuzzy_rewrite` parameters can
83+
be applied to the `term` subqueries constructed for all terms but the final
84+
term. They do not have any effect on the prefix query constructed for the
85+
final term.

docs/reference/query-dsl/match-phrase-prefix-query.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,6 @@ for appears.
5959
6060
For better solutions for _search-as-you-type_ see the
6161
<<search-suggesters-completion,completion suggester>> and
62-
{defguide}/_index_time_search_as_you_type.html[Index-Time Search-as-You-Type].
62+
the <<search-as-you-type,`search_as_you_type` field type>>.
6363
6464
===================================================

docs/reference/query-dsl/match-query.asciidoc

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,6 @@ process. It does not support field name prefixes, wildcard characters,
202202
or other "advanced" features. For this reason, chances of it failing are
203203
very small / non existent, and it provides an excellent behavior when it
204204
comes to just analyze and run that text as a query behavior (which is
205-
usually what a text search box does). Also, the <<query-dsl-match-query-phrase-prefix,`match_phrase_prefix`>>
206-
type can provide a great "as you type" behavior to automatically load search results.
205+
usually what a text search box does).
207206
208207
**************************************************

0 commit comments

Comments
 (0)