4
4
[partintro]
5
5
--
6
6
7
- _Text analysis_ is the process of converting text, like the body of any email,
8
- into _tokens_ or _terms_ which are added to the inverted index for searching.
9
- Analysis is performed by an <<analysis-analyzers,_analyzer_>> which can be
10
- either a built-in analyzer or a <<analysis-custom-analyzer,`custom`>> analyzer
11
- defined per index.
7
+ _Text analysis_ is the process of converting unstructured text, like
8
+ the body of an email or a product description, into a structured format that's
9
+ optimized for search.
12
10
13
11
[float]
14
- == Index time analysis
12
+ [[when-to-configure-analysis]]
13
+ === When to configure text analysis
15
14
16
- For instance, at index time the built-in <<english-analyzer,`english`>> _analyzer_
17
- will first convert the sentence:
15
+ {es} performs text analysis when indexing or searching <<text,`text`>> fields.
18
16
19
- [source,text]
20
- ------
21
- "The QUICK brown foxes jumped over the lazy dog!"
22
- ------
17
+ If your index doesn't contain `text` fields, no further setup is needed; you can
18
+ skip the pages in this section.
23
19
24
- into distinct tokens. It will then lowercase each token, remove frequent
25
- stopwords ("the") and reduce the terms to their word stems (foxes -> fox,
26
- jumped -> jump, lazy -> lazi). In the end, the following terms will be added
27
- to the inverted index:
20
+ However, if you use `text` fields or your text searches aren't returning results
21
+ as expected, configuring text analysis can often help. You should also look into
22
+ analysis configuration if you're using {es} to:
28
23
29
- [source,text]
30
- ------
31
- [ quick, brown, fox, jump, over, lazi, dog ]
32
- ------
24
+ * Build a search engine
25
+ * Mine unstructured data
26
+ * Fine-tune search for a specific language
27
+ * Perform lexicographic or linguistic research
33
28
34
29
[float]
35
- [[specify-index-time-analyzer]]
36
- === Specifying an index time analyzer
37
-
38
- {es} determines which index-time analyzer to use by
39
- checking the following parameters in order:
40
-
41
- . The <<analyzer,`analyzer`>> mapping parameter of the field
42
- . The `default` analyzer parameter in the index settings
43
-
44
- If none of these parameters are specified, the
45
- <<analysis-standard-analyzer,`standard` analyzer>> is used.
46
-
47
- [discrete]
48
- [[specify-index-time-field-analyzer]]
49
- ==== Specify the index-time analyzer for a field
50
-
51
- Each <<text,`text`>> field in a mapping can specify its own
52
- <<analyzer,`analyzer`>>:
53
-
54
- [source,console]
55
- -------------------------
56
- PUT my_index
57
- {
58
- "mappings": {
59
- "properties": {
60
- "title": {
61
- "type": "text",
62
- "analyzer": "standard"
63
- }
64
- }
65
- }
66
- }
67
- -------------------------
68
-
69
- [discrete]
70
- [[specify-index-time-default-analyzer]]
71
- ==== Specify a default index-time analyzer
72
-
73
- When <<indices-create-index,creating an index>>, you can set a default
74
- index-time analyzer using the `default` analyzer setting:
75
-
76
- [source,console]
77
- ----
78
- PUT my_index
79
- {
80
- "settings": {
81
- "analysis": {
82
- "analyzer": {
83
- "default": {
84
- "type": "whitespace"
85
- }
86
- }
87
- }
88
- }
89
- }
90
- ----
91
-
92
- A default index-time analyzer is useful when mapping multiple `text` fields that
93
- use the same analyzer. It's also used as a general fallback analyzer for both
94
- index-time and search-time analysis.
95
-
96
- [float]
97
- == Search time analysis
98
-
99
- This same analysis process is applied to the query string at search time in
100
- <<full-text-queries,full text queries>> like the
101
- <<query-dsl-match-query,`match` query>>
102
- to convert the text in the query string into terms of the same form as those
103
- that are stored in the inverted index.
104
-
105
- For instance, a user might search for:
106
-
107
- [source,text]
108
- ------
109
- "a quick fox"
110
- ------
111
-
112
- which would be analysed by the same `english` analyzer into the following terms:
113
-
114
- [source,text]
115
- ------
116
- [ quick, fox ]
117
- ------
118
-
119
- Even though the exact words used in the query string don't appear in the
120
- original text (`quick` vs `QUICK`, `fox` vs `foxes`), because we have applied
121
- the same analyzer to both the text and the query string, the terms from the
122
- query string exactly match the terms from the text in the inverted index,
123
- which means that this query would match our example document.
124
-
125
- [float]
126
- === Specifying a search time analyzer
127
-
128
- Usually the same analyzer should be used both at
129
- index time and at search time, and <<full-text-queries,full text queries>>
130
- like the <<query-dsl-match-query,`match` query>> will use the mapping to look
131
- up the analyzer to use for each field.
132
-
133
- The analyzer to use to search a particular field is determined by
134
- looking for:
135
-
136
- * An `analyzer` specified in the query itself.
137
- * The <<search-analyzer,`search_analyzer`>> mapping parameter.
138
- * The <<analyzer,`analyzer`>> mapping parameter.
139
- * An analyzer in the index settings called `default_search`.
140
- * An analyzer in the index settings called `default`.
141
- * The `standard` analyzer.
30
+ [[analysis-toc]]
31
+ === In this section
32
+
33
+ * <<analysis-overview>>
34
+ * <<analysis-concepts>>
35
+ * <<configure-text-analysis>>
36
+ * <<analysis-analyzers>>
37
+ * <<analysis-tokenizers>>
38
+ * <<analysis-tokenfilters>>
39
+ * <<analysis-charfilters>>
40
+ * <<analysis-normalizers>>
142
41
143
42
--
144
43
@@ -156,5 +55,4 @@ include::analysis/tokenfilters.asciidoc[]
156
55
157
56
include::analysis/charfilters.asciidoc[]
158
57
159
- include::analysis/normalizers.asciidoc[]
160
-
58
+ include::analysis/normalizers.asciidoc[]
0 commit comments