|
| 1 | +[[search-as-you-type]] |
| 2 | +=== Search as you type datatype |
| 3 | + |
| 4 | +experimental[] |
| 5 | + |
| 6 | +The `search_as_you_type` field type is a text-like field that is optimized to |
| 7 | +provide out-of-the-box support for queries that serve an as-you-type completion |
| 8 | +use case. It creates a series of subfields that are analyzed to index terms |
| 9 | +that can be efficiently matched by a query that partially matches the entire |
| 10 | +indexed text value. Both prefix completion (i.e matching terms starting at the |
| 11 | +beginning of the input) and infix completion (i.e. matching terms at any |
| 12 | +position within the input) are supported. |
| 13 | + |
| 14 | +When adding a field of this type to a mapping |
| 15 | + |
| 16 | +[source,js] |
| 17 | +-------------------------------------------------- |
| 18 | +PUT my_index |
| 19 | +{ |
| 20 | + "mappings": { |
| 21 | + "properties": { |
| 22 | + "my_field": { |
| 23 | + "type": "search_as_you_type" |
| 24 | + } |
| 25 | + } |
| 26 | + } |
| 27 | +} |
| 28 | +-------------------------------------------------- |
| 29 | +// CONSOLE |
| 30 | + |
| 31 | +This creates the following fields |
| 32 | + |
| 33 | +[horizontal] |
| 34 | + |
| 35 | +`my_field`:: |
| 36 | + |
| 37 | + Analyzed as configured in the mapping. If an analyzer is not configured, |
| 38 | + the default analyzer for the index is used |
| 39 | + |
| 40 | +`my_field._2gram`:: |
| 41 | + |
| 42 | + Wraps the analyzer of `my_field` with a shingle token filter of shingle |
| 43 | + size 2 |
| 44 | + |
| 45 | +`my_field._3gram`:: |
| 46 | + |
| 47 | + Wraps the analyzer of `my_field` with a shingle token filter of shingle |
| 48 | + size 3 |
| 49 | + |
| 50 | +`my_field._index_prefix`:: |
| 51 | + |
| 52 | + Wraps the analyzer of `my_field._3gram` with an edge ngram token filter |
| 53 | + |
| 54 | + |
| 55 | +The size of shingles in subfields can be configured with the `max_shingle_size` |
| 56 | +mapping parameter. The default is 3, and valid values for this parameter are |
| 57 | +integer values 2 - 4 inclusive. Shingle subfields will be created for each |
| 58 | +shingle size from 2 up to and including the `max_shingle_size`. The |
| 59 | +`my_field._index_prefix` subfield will always use the analyzer from the shingle |
| 60 | +subfield with the `max_shingle_size` when constructing its own analyzer. |
| 61 | + |
| 62 | +Increasing the `max_shingle_size` will improve matches for queries with more |
| 63 | +consecutive terms, at the cost of larger index size. The default |
| 64 | +`max_shingle_size` should usually be sufficient. |
| 65 | + |
| 66 | +The same input text is indexed into each of these fields automatically, with |
| 67 | +their differing analysis chains, when an indexed document has a value for the |
| 68 | +root field `my_field`. |
| 69 | + |
| 70 | +[source,js] |
| 71 | +-------------------------------------------------- |
| 72 | +PUT my_index/_doc/1?refresh |
| 73 | +{ |
| 74 | + "my_field": "quick brown fox jump lazy dog" |
| 75 | +} |
| 76 | +-------------------------------------------------- |
| 77 | +// CONSOLE |
| 78 | +// TEST[continued] |
| 79 | + |
| 80 | +The most efficient way of querying to serve a search-as-you-type use case is |
| 81 | +usually a <<query-dsl-multi-match-query,`multi_match`>> query of type |
| 82 | +<<query-dsl-match-bool-prefix-query,`bool_prefix`>> that targets the root |
| 83 | +`search_as_you_type` field and its shingle subfields. This can match the query |
| 84 | +terms in any order, but will score documents higher if they contain the terms |
| 85 | +in order in a shingle subfield. |
| 86 | + |
| 87 | +[source,js] |
| 88 | +-------------------------------------------------- |
| 89 | +GET my_index/_search |
| 90 | +{ |
| 91 | + "query": { |
| 92 | + "multi_match": { |
| 93 | + "query": "brown f", |
| 94 | + "type": "bool_prefix", |
| 95 | + "fields": [ |
| 96 | + "my_field", |
| 97 | + "my_field._2gram", |
| 98 | + "my_field._3gram" |
| 99 | + ] |
| 100 | + } |
| 101 | + } |
| 102 | +} |
| 103 | +-------------------------------------------------- |
| 104 | +// CONSOLE |
| 105 | +// TEST[continued] |
| 106 | + |
| 107 | +[source,js] |
| 108 | +-------------------------------------------------- |
| 109 | +{ |
| 110 | + "took" : 44, |
| 111 | + "timed_out" : false, |
| 112 | + "_shards" : { |
| 113 | + "total" : 1, |
| 114 | + "successful" : 1, |
| 115 | + "skipped" : 0, |
| 116 | + "failed" : 0 |
| 117 | + }, |
| 118 | + "hits" : { |
| 119 | + "total" : { |
| 120 | + "value" : 1, |
| 121 | + "relation" : "eq" |
| 122 | + }, |
| 123 | + "max_score" : 0.8630463, |
| 124 | + "hits" : [ |
| 125 | + { |
| 126 | + "_index" : "my_index", |
| 127 | + "_type" : "_doc", |
| 128 | + "_id" : "1", |
| 129 | + "_score" : 0.8630463, |
| 130 | + "_source" : { |
| 131 | + "my_field" : "quick brown fox jump lazy dog" |
| 132 | + } |
| 133 | + } |
| 134 | + ] |
| 135 | + } |
| 136 | +} |
| 137 | +-------------------------------------------------- |
| 138 | +// TESTRESPONSE[s/"took" : 44/"took" : $body.took/] |
| 139 | +// TESTRESPONSE[s/"max_score" : 0.8630463/"max_score" : $body.hits.max_score/] |
| 140 | +// TESTRESPONSE[s/"_score" : 0.8630463/"_score" : $body.hits.hits.0._score/] |
| 141 | + |
| 142 | +To search for documents that strictly match the query terms in order, or to |
| 143 | +search using other properties of phrase queries, use a |
| 144 | +<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix` query>> on the root |
| 145 | +field. A <<query-dsl-match-query-phrase,`match_phrase` query>> can also be used |
| 146 | +if the last term should be matched exactly, and not as a prefix. Using phrase |
| 147 | +queries may be less efficient than using the `match_bool_prefix` query. |
| 148 | + |
| 149 | +[source,js] |
| 150 | +-------------------------------------------------- |
| 151 | +GET my_index/_search |
| 152 | +{ |
| 153 | + "query": { |
| 154 | + "match_phrase_prefix": { |
| 155 | + "my_field": "brown f" |
| 156 | + } |
| 157 | + } |
| 158 | +} |
| 159 | +-------------------------------------------------- |
| 160 | +// CONSOLE |
| 161 | +// TEST[continued] |
| 162 | + |
| 163 | +[[specific-params]] |
| 164 | +==== Parameters specific to the `search_as_you_type` field |
| 165 | + |
| 166 | +The following parameters are accepted in a mapping for the `search_as_you_type` |
| 167 | +field and are specific to this field type |
| 168 | + |
| 169 | +[horizontal] |
| 170 | + |
| 171 | +`max_shingle_size`:: |
| 172 | + |
| 173 | + The largest shingle size to index the input with and create subfields for, |
| 174 | + creating one subfield for each shingle size between 2 and |
| 175 | + `max_shingle_size`. Accepts integer values between 2 and 4 inclusive. This |
| 176 | + option defaults to 3. |
| 177 | + |
| 178 | + |
| 179 | +[[general-params]] |
| 180 | +==== Parameters of the field type as a text field |
| 181 | + |
| 182 | +The following parameters are accepted in a mapping for the `search_as_you_type` |
| 183 | +field due to its nature as a text-like field, and behave similarly to their |
| 184 | +behavior when configuring a field of the <<text,`text`>> datatype. Unless |
| 185 | +otherwise noted, these options configure the root fields subfields in |
| 186 | +the same way. |
| 187 | + |
| 188 | +<<analyzer,`analyzer`>>:: |
| 189 | + |
| 190 | + The <<analysis,analyzer>> which should be used for |
| 191 | + <<mapping-index,`analyzed`>> string fields, both at index-time and at |
| 192 | + search-time (unless overridden by the |
| 193 | + <<search-analyzer,`search_analyzer`>>). Defaults to the default index |
| 194 | + analyzer, or the <<analysis-standard-analyzer,`standard` analyzer>>. |
| 195 | + |
| 196 | +<<mapping-index,`index`>>:: |
| 197 | + |
| 198 | + Should the field be searchable? Accepts `true` (default) or `false`. |
| 199 | + |
| 200 | +<<index-options,`index_options`>>:: |
| 201 | + |
| 202 | + What information should be stored in the index, for search and highlighting |
| 203 | + purposes. Defaults to `positions`. |
| 204 | + |
| 205 | +<<norms,`norms`>>:: |
| 206 | + |
| 207 | + Whether field-length should be taken into account when scoring queries. |
| 208 | + Accepts `true` or `false`. This option configures the root field |
| 209 | + and shingle subfields, where its default is `true`. It does not configure |
| 210 | + the prefix subfield, where it it `false`. |
| 211 | + |
| 212 | +<<mapping-store,`store`>>:: |
| 213 | + |
| 214 | + Whether the field value should be stored and retrievable separately from |
| 215 | + the <<mapping-source-field,`_source`>> field. Accepts `true` or `false` |
| 216 | + (default). This option only configures the root field, and does not |
| 217 | + configure any subfields. |
| 218 | + |
| 219 | +<<search-analyzer,`search_analyzer`>>:: |
| 220 | + |
| 221 | + The <<analyzer,`analyzer`>> that should be used at search time on |
| 222 | + <<mapping-index,`analyzed`>> fields. Defaults to the `analyzer` setting. |
| 223 | + |
| 224 | +<<search-quote-analyzer,`search_quote_analyzer`>>:: |
| 225 | + |
| 226 | + The <<analyzer,`analyzer`>> that should be used at search time when a |
| 227 | + phrase is encountered. Defaults to the `search_analyzer` setting. |
| 228 | + |
| 229 | +<<similarity,`similarity`>>:: |
| 230 | + |
| 231 | + Which scoring algorithm or _similarity_ should be used. Defaults |
| 232 | + to `BM25`. |
| 233 | + |
| 234 | +<<term-vector,`term_vector`>>:: |
| 235 | + |
| 236 | + Whether term vectors should be stored for an <<mapping-index,`analyzed`>> |
| 237 | + field. Defaults to `no`. This option configures the root field and shingle |
| 238 | + subfields, but not the prefix subfield. |
| 239 | + |
| 240 | + |
| 241 | +[[prefix-queries]] |
| 242 | +==== Optimization of prefix queries |
| 243 | + |
| 244 | +When making a <<query-dsl-prefix-query,`prefix`>> query to the root field or |
| 245 | +any of its subfields, the query will be rewritten to a |
| 246 | +<<query-dsl-term-query,`term`>> query on the `._index_prefix` subfield. This |
| 247 | +matches more efficiently than is typical of `prefix` queries on text fields, |
| 248 | +as prefixes up to a certain length of each shingle are indexed directly as |
| 249 | +terms in the `._index_prefix` subfield. |
| 250 | + |
| 251 | +The analyzer of the `._index_prefix` subfield slightly modifies the |
| 252 | +shingle-building behavior to also index prefixes of the terms at the end of the |
| 253 | +field's value that normally would not be produced as shingles. For example, if |
| 254 | +the value `quick brown fox` is indexed into a `search_as_you_type` field with |
| 255 | +`max_shingle_size` of 3, prefixes for `brown fox` and `fox` are also indexed |
| 256 | +into the `._index_prefix` subfield even though they do not appear as terms in |
| 257 | +the `._3gram` subfield. This allows for completion of all the terms in the |
| 258 | +field's input. |
0 commit comments