Skip to content

Commit 9d8e249

Browse files
geekpeteMK Swanson
authored and
MK Swanson
committed
[DOCS] path_hierarchy tokenizer examples (#39630)
* [DOCS] path_hierarchy tokenizer examples * minor typo and some better wording * Fixed consistency of GETs/PUTs * Removed superfluous pararameter - Removed search_analyzer from tokenizer blocks - Also fixed a minor typo * Fixed a syntax error after previous edit * linked new examples page from others * fixed page link * minor formatting fix
1 parent f00f937 commit 9d8e249

File tree

3 files changed

+201
-0
lines changed

3 files changed

+201
-0
lines changed

docs/reference/analysis/tokenizers.asciidoc

+4
Original file line numberDiff line numberDiff line change
@@ -155,3 +155,7 @@ include::tokenizers/simplepattern-tokenizer.asciidoc[]
155155
include::tokenizers/simplepatternsplit-tokenizer.asciidoc[]
156156

157157
include::tokenizers/pathhierarchy-tokenizer.asciidoc[]
158+
159+
include::tokenizers/pathhierarchy-tokenizer-examples.asciidoc[]
160+
161+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
[[analysis-pathhierarchy-tokenizer-examples]]
2+
=== Path Hierarchy Tokenizer Examples
3+
4+
A common use-case for the `path_hierarchy` tokenizer is filtering results by
5+
file paths. If indexing a file path along with the data, the use of the
6+
`path_hierarchy` tokenizer to analyze the path allows filtering the results
7+
by different parts of the file path string.
8+
9+
10+
This example configures an index to have two custom analyzers and applies
11+
those analyzers to multifields of the `file_path` text field that will
12+
store filenames. One of the two analyzers uses reverse tokenization.
13+
Some sample documents are then indexed to represent some file paths
14+
for photos inside photo folders of two different users.
15+
16+
17+
[source,js]
18+
--------------------------------------------------
19+
PUT file-path-test
20+
{
21+
"settings": {
22+
"analysis": {
23+
"analyzer": {
24+
"custom_path_tree": {
25+
"tokenizer": "custom_hierarchy"
26+
},
27+
"custom_path_tree_reversed": {
28+
"tokenizer": "custom_hierarchy_reversed"
29+
}
30+
},
31+
"tokenizer": {
32+
"custom_hierarchy": {
33+
"type": "path_hierarchy",
34+
"delimiter": "/"
35+
},
36+
"custom_hierarchy_reversed": {
37+
"type": "path_hierarchy",
38+
"delimiter": "/",
39+
"reverse": "true"
40+
}
41+
}
42+
}
43+
},
44+
"mappings": {
45+
"_doc": {
46+
"properties": {
47+
"file_path": {
48+
"type": "text",
49+
"fields": {
50+
"tree": {
51+
"type": "text",
52+
"analyzer": "custom_path_tree"
53+
},
54+
"tree_reversed": {
55+
"type": "text",
56+
"analyzer": "custom_path_tree_reversed"
57+
}
58+
}
59+
}
60+
}
61+
}
62+
}
63+
}
64+
65+
POST file-path-test/_doc/1
66+
{
67+
"file_path": "/User/alice/photos/2017/05/16/my_photo1.jpg"
68+
}
69+
70+
POST file-path-test/_doc/2
71+
{
72+
"file_path": "/User/alice/photos/2017/05/16/my_photo2.jpg"
73+
}
74+
75+
POST file-path-test/_doc/3
76+
{
77+
"file_path": "/User/alice/photos/2017/05/16/my_photo3.jpg"
78+
}
79+
80+
POST file-path-test/_doc/4
81+
{
82+
"file_path": "/User/alice/photos/2017/05/15/my_photo1.jpg"
83+
}
84+
85+
POST file-path-test/_doc/5
86+
{
87+
"file_path": "/User/bob/photos/2017/05/16/my_photo1.jpg"
88+
}
89+
--------------------------------------------------
90+
// CONSOLE
91+
// TESTSETUP
92+
93+
94+
A search for a particular file path string against the text field matches all
95+
the example documents, with Bob's documents ranking highest due to `bob` also
96+
being one of the terms created by the standard analyzer boosting relevance for
97+
Bob's documents.
98+
99+
[source,js]
100+
--------------------------------------------------
101+
GET file-path-test/_search
102+
{
103+
"query": {
104+
"match": {
105+
"file_path": "/User/bob/photos/2017/05"
106+
}
107+
}
108+
}
109+
--------------------------------------------------
110+
// CONSOLE
111+
112+
113+
It's simple to match or filter documents with file paths that exist within a
114+
particular directory using the `file_path.tree` field.
115+
116+
[source,js]
117+
--------------------------------------------------
118+
GET file-path-test/_search
119+
{
120+
"query": {
121+
"term": {
122+
"file_path.tree": "/User/alice/photos/2017/05/16"
123+
}
124+
}
125+
}
126+
--------------------------------------------------
127+
// CONSOLE
128+
129+
With the reverse parameter for this tokenizer, it's also possible to match
130+
from the other end of the file path, such as individual file names or a deep
131+
level subdirectory. The following example shows a search for all files named
132+
`my_photo1.jpg` within any directory via the `file_path.tree_reversed` field
133+
configured to use the reverse parameter in the mapping.
134+
135+
136+
[source,js]
137+
--------------------------------------------------
138+
GET file-path-test/_search
139+
{
140+
"query": {
141+
"term": {
142+
"file_path.tree_reversed": {
143+
"value": "my_photo1.jpg"
144+
}
145+
}
146+
}
147+
}
148+
--------------------------------------------------
149+
// CONSOLE
150+
151+
152+
Viewing the tokens generated with both forward and reverse is instructive
153+
in showing the tokens created for the same file path value.
154+
155+
156+
[source,js]
157+
--------------------------------------------------
158+
POST file-path-test/_analyze
159+
{
160+
"analyzer": "custom_path_tree",
161+
"text": "/User/alice/photos/2017/05/16/my_photo1.jpg"
162+
}
163+
164+
POST file-path-test/_analyze
165+
{
166+
"analyzer": "custom_path_tree_reversed",
167+
"text": "/User/alice/photos/2017/05/16/my_photo1.jpg"
168+
}
169+
--------------------------------------------------
170+
// CONSOLE
171+
172+
173+
It's also useful to be able to filter with file paths when combined with other
174+
types of searches, such as this example looking for any files paths with `16`
175+
that also must be in Alice's photo directory.
176+
177+
[source,js]
178+
--------------------------------------------------
179+
GET file-path-test/_search
180+
{
181+
"query": {
182+
"bool" : {
183+
"must" : {
184+
"match" : { "file_path" : "16" }
185+
},
186+
"filter": {
187+
"term" : { "file_path.tree" : "/User/alice" }
188+
}
189+
}
190+
}
191+
}
192+
--------------------------------------------------
193+
// CONSOLE

docs/reference/analysis/tokenizers/pathhierarchy-tokenizer.asciidoc

+4
Original file line numberDiff line numberDiff line change
@@ -170,3 +170,7 @@ If we were to set `reverse` to `true`, it would produce the following:
170170
---------------------------
171171
[ one/two/three/, two/three/, three/ ]
172172
---------------------------
173+
174+
[float]
175+
=== Detailed Examples
176+
See <<analysis-pathhierarchy-tokenizer-examples, detailed examples here>>.

0 commit comments

Comments
 (0)