Skip to content

Commit a47c7cd

Browse files
committed
Add documentation for JSON fields. (#35281)
* Add documentation for JSON fields.
1 parent daf6c00 commit a47c7cd

File tree

2 files changed

+204
-0
lines changed

2 files changed

+204
-0
lines changed

docs/reference/mapping/types.asciidoc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,12 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>
4444

4545
<<alias>>:: Defines an alias to an existing field.
4646

47+
<<json>>:: Allows an entire JSON object to be indexed as a single field.
48+
4749
<<rank-feature>>:: Record numeric feature to boost hits at query time.
4850

51+
<<feature>>:: Record numeric features to boost hits at query time.
52+
4953
<<rank-features>>:: Record numeric features to boost hits at query time.
5054

5155
<<dense-vector>>:: Record dense vectors of float values.
@@ -87,6 +91,8 @@ include::types/geo-shape.asciidoc[]
8791

8892
include::types/ip.asciidoc[]
8993

94+
include::types/json.asciidoc[]
95+
9096
include::types/keyword.asciidoc[]
9197

9298
include::types/nested.asciidoc[]
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
[[json]]
2+
=== JSON datatype
3+
4+
experimental[The `json` field type is experimental and may be changed in a breaking way in future releases.]
5+
6+
By default, each subfield in an object is mapped and indexed separately. If
7+
the names or types of the subfields are not known in advance, then they are
8+
<<dynamic-mapping, mapped dynamically>>.
9+
10+
The `json` type provides an alternative approach, where the entire object is
11+
mapped as a single field. Given an object, the `json` mapping will parse out
12+
its leaf values and index them into one field. The object's contents can then
13+
be searched through simple keyword-style queries.
14+
15+
This data type can be useful for indexing objects with a very large number of
16+
distinct keys. Compared to mapping each field separately, `json` fields have
17+
the following advantages:
18+
19+
- Only one field mapping is created for the whole object, which can help
20+
prevent a <<mapping-limit-settings, mappings explosion>> due to a large
21+
number of field mappings.
22+
- A `json` field may take up less space in the index, as only one underlying
23+
field is created.
24+
25+
However, `json` fields present a trade-off in terms of search functionality.
26+
Only basic queries are allowed, with no support for numeric range queries or
27+
aggregations. Further information on the limitations can be found in the
28+
<<supported-operations, Supported operations>> section.
29+
30+
NOTE: The `json` mapping type should **not** be used for indexing all JSON
31+
content, as it provides only limited search functionality. The default
32+
approach, where each subfield has its own entry in the mappings, works well in
33+
the majority of cases.
34+
35+
A `json` field can be created as follows:
36+
[source,js]
37+
--------------------------------
38+
PUT bug_reports
39+
{
40+
"mappings": {
41+
"_doc": {
42+
"properties": {
43+
"title": {
44+
"type": "text"
45+
},
46+
"labels": {
47+
"type": "json"
48+
}
49+
}
50+
}
51+
}
52+
}
53+
54+
POST bug_reports/_doc/1
55+
{
56+
"title": "Results are not sorted correctly.",
57+
"labels": {
58+
"priority": "urgent",
59+
"release": ["v1.2.5", "v1.3.0"],
60+
"timestamp": {
61+
"created": 1541458026,
62+
"closed": 1541457010
63+
}
64+
}
65+
}
66+
--------------------------------
67+
// CONSOLE
68+
// TESTSETUP
69+
70+
During indexing, tokens are created for each leaf value in the JSON object. The
71+
values are indexed as string keywords, without analysis or special handling for
72+
numbers or dates.
73+
74+
Querying the top-level `json` field searches all leaf values in the object:
75+
[source,js]
76+
--------------------------------
77+
POST bug_reports/_search
78+
{
79+
"query": {
80+
"term": {"labels": "urgent"}
81+
}
82+
}
83+
--------------------------------
84+
// CONSOLE
85+
86+
To query on a specific key in the JSON object, object dot notation is used:
87+
[source,js]
88+
--------------------------------
89+
POST bug_reports/_search
90+
{
91+
"query": {
92+
"term": {"labels.release": "v1.3.0"}
93+
}
94+
}
95+
--------------------------------
96+
// CONSOLE
97+
98+
[[supported-operations]]
99+
==== Supported operations
100+
101+
Currently, `json` fields can be used with the following query types:
102+
103+
- `term`, `terms`, and `terms_set`
104+
- `prefix`
105+
- `range`
106+
- `match` and `multi_match`
107+
- `query_string` and `simple_query_string`
108+
- `exists`
109+
110+
When querying, it is not possible to refer to field keys using wildcards, as in
111+
`{ "term": {"labels.time*": 1541457010}}`. Note that all queries, including
112+
`range`, treat the values as string keywords.
113+
114+
Aggregating, highlighting, or sorting on a `json` field is not supported.
115+
116+
Finally, because of the way leaf values are stored in the index, the null
117+
character `\0` is not allowed to appear in the keys of the JSON object.
118+
119+
[[stored-fields]]
120+
==== Stored fields
121+
122+
If the <<mapping-store,`store`>> option is enabled, the entire JSON object will
123+
be stored in pretty-printed format. It can be retrieved through the top-level
124+
`json` field:
125+
126+
[source,js]
127+
--------------------------------
128+
POST bug_reports/_search
129+
{
130+
"query": { "match": { "title": "results not sorted" }},
131+
"stored_fields": ["labels"]
132+
}
133+
--------------------------------
134+
// CONSOLE
135+
136+
Field keys cannot be used to load stored content. For example, specifying
137+
`"stored_fields": ["labels.timestamp"]` will return an empty list.
138+
139+
[[json-params]]
140+
==== Parameters for JSON fields
141+
142+
Because of the similarities in the way values are indexed, the `json` type
143+
shares many mapping options with <<keyword, `keyword`>>. The following
144+
parameters are accepted:
145+
146+
[horizontal]
147+
148+
<<mapping-boost,`boost`>>::
149+
150+
Mapping field-level query time boosting. Accepts a floating point number,
151+
defaults to `1.0`.
152+
153+
`depth_limit`::
154+
155+
The maximum allowed depth of the JSON field, in terms of nested inner
156+
objects. If a JSON field exceeds this limit, then an error will be
157+
thrown. Defaults to `20`.
158+
159+
<<ignore-above,`ignore_above`>>::
160+
161+
Leaf values longer than this limit will not be indexed. By default, there
162+
is no limit and all values will be indexed. Note that this limit applies
163+
to the leaf values within the JSON field, and not the length of the entire
164+
field.
165+
166+
<<mapping-index,`index`>>::
167+
168+
Determines if the field should be searchable. Accepts `true` (default) or
169+
`false`.
170+
171+
<<index-options,`index_options`>>::
172+
173+
What information should be stored in the index for scoring purposes.
174+
Defaults to `docs` but can also be set to `freqs` to take term frequency
175+
into account when computing scores.
176+
177+
<<null-value,`null_value`>>::
178+
179+
A string value which is substituted for any explicit `null` values within
180+
the JSON field. Defaults to `null`, which means null sfields are treated as
181+
if it were missing.
182+
183+
<<similarity,`similarity`>>::
184+
185+
Which scoring algorithm or _similarity_ should be used. Defaults
186+
to `BM25`.
187+
188+
`split_queries_on_whitespace`::
189+
190+
Whether <<full-text-queries,full text queries>> should split the input on
191+
whitespace when building a query for this field. Accepts `true` or `false`
192+
(default).
193+
194+
<<mapping-store,`store`>>::
195+
196+
Whether the field value should be stored and retrievable separately from
197+
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
198+
(default).

0 commit comments

Comments
 (0)