Skip to content

Commit acbba64

Browse files
committed
Add documentation.
1 parent bad2253 commit acbba64

File tree

3 files changed

+200
-7
lines changed

3 files changed

+200
-7
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
[[query-dsl-combined-fields-query]]
2+
=== Combined fields
3+
++++
4+
<titleabbrev>Combined fields</titleabbrev>
5+
++++
6+
7+
The `combined_fields` query supports searching multiple text fields as if their
8+
contents had been indexed into one combined field. It takes a term-centric
9+
view of the query: first it analyzes the query string into individual terms,
10+
then looks for each term in any of the fields. This query is particularly
11+
useful when a match could span multiple text fields, for example the `title`,
12+
`abstract` and `body` of an article:
13+
14+
[source,console]
15+
--------------------------------------------------
16+
GET /_search
17+
{
18+
"query": {
19+
"combined_fields" : {
20+
"query": "database systems",
21+
"fields": [ "title", "abstract", "body"],
22+
"operator": "and"
23+
}
24+
}
25+
}
26+
--------------------------------------------------
27+
28+
The `combined_fields` query takes a principled approach to scoring based on the
29+
simple BM25F formula described in
30+
http://www.staff.city.ac.uk/~sb317/papers/foundations_bm25_review.pdf[The Probabilistic Relevance Framework: BM25 and Beyond].
31+
When scoring matches, the query combines term and collection statistics across
32+
fields. This allows it to score each match as if the specified fields had been
33+
indexed into a single combined field. (Note that this is a best attempt --
34+
`combined_fields` makes some approximations and scores will not obey this
35+
model perfectly.)
36+
37+
[WARNING]
38+
.Field number limit
39+
===================================================
40+
There is a limit on the number of fields that can be queried at once. It is
41+
defined by the `indices.query.bool.max_clause_count` <<search-settings>>
42+
which defaults to 1024.
43+
===================================================
44+
45+
==== Per-field boosting
46+
47+
Individual fields can be boosted with the caret (`^`) notation:
48+
49+
[source,console]
50+
--------------------------------------------------
51+
GET /_search
52+
{
53+
"query": {
54+
"combined_fields" : {
55+
"query" : "distributed consensus",
56+
"fields" : [ "title^2", "body" ] <1>
57+
}
58+
}
59+
}
60+
--------------------------------------------------
61+
62+
Field boosts are interpreted according to the combined field model. For example,
63+
if the `title` field has a boost of 2, the score is calculated as if each term
64+
in the title appeared twice in the synthetic combined field.
65+
66+
NOTE: The `combined_fields` query requires that field boosts are greater than
67+
or equal to 1.0. Field boosts are allowed to be fractional.
68+
69+
[[combined-field-top-level-params]]
70+
==== Top-level parameters for `combined_fields`
71+
72+
`fields`::
73+
(Required, array of strings) List of fields to search. Field wildcard patterns
74+
are allowed. Only <<text,`text`>> fields are supported, and they must all have
75+
the same search <<analyzer,`analyzer`>>.
76+
77+
`query`::
78+
+
79+
--
80+
(Required, string) Text to search for in the provided `<fields>`.
81+
82+
The `combined_fields` query <<analysis,analyzes>> the provided text before
83+
performing a search.
84+
--
85+
86+
`auto_generate_synonyms_phrase_query`::
87+
+
88+
--
89+
(Optional, Boolean) If `true`, <<query-dsl-match-query-phrase,match phrase>>
90+
queries are automatically created for multi-term synonyms. Defaults to `true`.
91+
92+
See <<query-dsl-match-query-synonyms,Use synonyms with match query>> for an
93+
example.
94+
--
95+
96+
`operator`::
97+
+
98+
--
99+
(Optional, string) Boolean logic used to interpret text in the `query` value.
100+
Valid values are:
101+
102+
`or` (Default)::
103+
For example, a `query` value of `database systems` is interpreted as `database
104+
OR systems`.
105+
106+
`and`::
107+
For example, a `query` value of `database systems` is interpreted as `database
108+
AND systems`.
109+
--
110+
111+
`minimum_should_match`::
112+
+
113+
--
114+
(Optional, string) Minimum number of clauses that must match for a document to
115+
be returned. See the <<query-dsl-minimum-should-match, `minimum_should_match`
116+
parameter>> for valid values and more information.
117+
--
118+
119+
`zero_terms_query`::
120+
+
121+
--
122+
(Optional, string) Indicates whether no documents are returned if the `analyzer`
123+
removes all tokens, such as when using a `stop` filter. Valid values are:
124+
125+
`none` (Default)::
126+
No documents are returned if the `analyzer` removes all tokens.
127+
128+
`all`::
129+
Returns all documents, similar to a <<query-dsl-match-all-query,`match_all`>>
130+
query.
131+
132+
See <<query-dsl-match-query-zero>> for an example.
133+
--
134+
135+
===== Comparison to `multi_match` query
136+
137+
The `combined_fields` query provides a principled way of matching and scoring
138+
across multiple <<text, `text`>> fields. To support this, it requires that all
139+
fields have the same search <<analyzer,`analyzer`>>.
140+
141+
If you want a single query that handles fields of different types like
142+
keywords or numbers, then the <<query-dsl-multi-match-query,`multi_match`>>
143+
query may be a better fit. It supports both text and non-text fields, and
144+
accepts text fields that do not share the same analyzer.
145+
146+
`multi_match` takes a field-centric view of the query by default. In contrast,
147+
`combined_fields` is term-centric: `operator` and `minimum_should_match` are
148+
applied per-term, instead of per-field. Concretely, a query like
149+
150+
[source,console]
151+
--------------------------------------------------
152+
GET /_search
153+
{
154+
"query": {
155+
"combined_fields" : {
156+
"query": "database systems",
157+
"fields": [ "title", "abstract"],
158+
"operator": "and"
159+
}
160+
}
161+
}
162+
--------------------------------------------------
163+
164+
is executed as
165+
166+
+(combined("database", fields:["title" "abstract"]))
167+
+(combined("systems", fields:["title", "abstract"]))
168+
169+
In other words, all terms must be present in at least one field for a
170+
document to match.
171+
172+
[NOTE]
173+
.Custom similarities
174+
===================================================
175+
The `combined_fields` query currently only supports the `BM25` similarity
176+
(which is the default unless a <<index-modules-similarity, custom similarity>>
177+
is configured). <<similarity, Per-field similarities>> are also not allowed.
178+
Using `combined_fields` in either of these cases will result in an error.
179+
===================================================

docs/reference/query-dsl/full-text-queries.asciidoc

+11-6
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
[[full-text-queries]]
22
== Full text queries
33

4-
The full text queries enable you to search <<analysis,analyzed text fields>> such as the
5-
body of an email. The query string is processed using the same analyzer that was applied to
6-
the field during indexing.
4+
The full text queries enable you to search <<analysis,analyzed text fields>> such as the
5+
body of an email. The query string is processed using the same analyzer that was applied to
6+
the field during indexing.
77

88
The queries in this group are:
99

@@ -21,13 +21,16 @@ the last term, which is matched as a `prefix` query
2121

2222
<<query-dsl-match-query-phrase,`match_phrase` query>>::
2323
Like the `match` query but used for matching exact phrases or word proximity matches.
24-
24+
2525
<<query-dsl-match-query-phrase-prefix,`match_phrase_prefix` query>>::
2626
Like the `match_phrase` query, but does a wildcard search on the final word.
27-
27+
2828
<<query-dsl-multi-match-query,`multi_match` query>>::
2929
The multi-field version of the `match` query.
3030

31+
<<query-dsl-combined-fields-query,`combined_fields` query>>::
32+
Matches over multiple fields as if they had been indexed into one combined field.
33+
3134
<<query-dsl-query-string-query,`query_string` query>>::
3235
Supports the compact Lucene <<query-string-syntax,query string syntax>>,
3336
allowing you to specify AND|OR|NOT conditions and multi-field search
@@ -48,8 +51,10 @@ include::match-phrase-query.asciidoc[]
4851

4952
include::match-phrase-prefix-query.asciidoc[]
5053

54+
include::combined-fields-query.asciidoc[]
55+
5156
include::multi-match-query.asciidoc[]
5257

5358
include::query-string-query.asciidoc[]
5459

55-
include::simple-query-string-query.asciidoc[]
60+
include::simple-query-string-query.asciidoc[]

docs/reference/query-dsl/multi-match-query.asciidoc

+10-1
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,10 @@ This query is executed as:
192192
In other words, *all terms* must be present *in a single field* for a document
193193
to match.
194194
195-
See <<type-cross-fields>> for a better solution.
195+
The <<query-dsl-combined-fields-query, `combined_fields`>> query offers a
196+
term-centric approach that handles `operator` and `minimum_should_match` on a
197+
per-term basis. The other multi-match mode <<type-cross-fields>> also
198+
addresses this issue.
196199
197200
===================================================
198201

@@ -385,6 +388,12 @@ explanation:
385388
Also, accepts `analyzer`, `boost`, `operator`, `minimum_should_match`,
386389
`lenient` and `zero_terms_query`.
387390

391+
WARNING: The `cross_fields` type blends field statistics in a way that does
392+
not always produce well-formed scores (for example scores can become
393+
negative). As an alternative, you can consider the
394+
<<query-dsl-combined-fields-query,`combined_fields`>> query, which is also
395+
term-centric but combines field statistics in a more robust way.
396+
388397
[[cross-field-analysis]]
389398
===== `cross_field` and analysis
390399

0 commit comments

Comments
 (0)