Add documentation.

jtibshirani · jtibshirani · commit acbba646924e · 2021-04-01T21:29:57.000-07:00
diff --git a/docs/reference/query-dsl/combined-fields-query.asciidoc b/docs/reference/query-dsl/combined-fields-query.asciidoc
@@ -0,0 +1,179 @@
+[[query-dsl-combined-fields-query]]
+=== Combined fields
+++++
+<titleabbrev>Combined fields</titleabbrev>
+++++
+
+The `combined_fields` query supports searching multiple text fields as if their
+contents had been indexed into one combined field. It takes a term-centric
+view of the query: first it analyzes the query string into individual terms,
+then looks for each term in any of the fields. This query is particularly
+useful when a match could span multiple text fields, for example the `title`,
+`abstract` and `body` of an article:
+
+[source,console]
+--------------------------------------------------
+GET /_search
+{
+  "query": {
+    "combined_fields" : {
+      "query":      "database systems",
+      "fields":     [ "title", "abstract", "body"],
+      "operator":   "and"
+    }
+  }
+}
+--------------------------------------------------
+
+The `combined_fields` query takes a principled approach to scoring based on the
+simple BM25F formula described in
+http://www.staff.city.ac.uk/~sb317/papers/foundations_bm25_review.pdf[The Probabilistic Relevance Framework: BM25 and Beyond].
+When scoring matches, the query combines term and collection statistics across
+fields. This allows it to score each match as if the specified fields had been
+indexed into a single combined field. (Note that this is a best attempt --
+`combined_fields` makes some approximations and scores will not obey this
+model perfectly.)
+
+[WARNING]
+.Field number limit
+===================================================
+There is a limit on the number of fields that can be queried at once. It is
+defined by the `indices.query.bool.max_clause_count` <<search-settings>>
+which defaults to 1024.
+===================================================
+
+==== Per-field boosting
+
+Individual fields can be boosted with the caret (`^`) notation:
+
+[source,console]
+--------------------------------------------------
+GET /_search
+{
+  "query": {
+    "combined_fields" : {
+      "query" : "distributed consensus",
+      "fields" : [ "title^2", "body" ] <1>
+    }
+  }
+}
+--------------------------------------------------
+
+Field boosts are interpreted according to the combined field model. For example,
+if the `title` field has a boost of 2, the score is calculated as if each term
+in the title appeared twice in the synthetic combined field.
+
+NOTE: The `combined_fields` query requires that field boosts are greater than
+or equal to 1.0. Field boosts are allowed to be fractional.
+
+[[combined-field-top-level-params]]
+==== Top-level parameters for `combined_fields`
+
+`fields`::
+(Required, array of strings) List of fields to search. Field wildcard patterns
+are allowed. Only <<text,`text`>> fields are supported, and they must all have
+the same search <<analyzer,`analyzer`>>.
+
+`query`::
++
+--
+(Required, string) Text to search for in the provided `<fields>`.
+
+The `combined_fields` query <<analysis,analyzes>> the provided text before
+performing a search.
+--
+
+`auto_generate_synonyms_phrase_query`::
++
+--
+(Optional, Boolean) If `true`, <<query-dsl-match-query-phrase,match phrase>>
+queries are automatically created for multi-term synonyms. Defaults to `true`.
+
+See <<query-dsl-match-query-synonyms,Use synonyms with match query>> for an
+example.
+--
+
+`operator`::
++
+--
+(Optional, string) Boolean logic used to interpret text in the `query` value.
+Valid values are:
+
+`or` (Default)::
+For example, a `query` value of `database systems` is interpreted as `database
+OR systems`.
+
+`and`::
+For example, a `query` value of `database systems` is interpreted as `database
+AND systems`.
+--
+
+`minimum_should_match`::
++
+--
+(Optional, string) Minimum number of clauses that must match for a document to
+be returned. See the <<query-dsl-minimum-should-match, `minimum_should_match`
+parameter>> for valid values and more information.
+--
+
+`zero_terms_query`::
++
+--
+(Optional, string) Indicates whether no documents are returned if the `analyzer`
+removes all tokens, such as when using a `stop` filter. Valid values are:
+
+`none` (Default)::
+No documents are returned if the `analyzer` removes all tokens.
+
+`all`::
+Returns all documents, similar to a <<query-dsl-match-all-query,`match_all`>>
+query.
+
+See <<query-dsl-match-query-zero>> for an example.
+--
+
+===== Comparison to `multi_match` query
+
+The `combined_fields` query provides a principled way of matching and scoring
+across multiple <<text, `text`>> fields. To support this, it requires that all
+fields have the same search <<analyzer,`analyzer`>>.
+
+If you want a single query that handles fields of different types like
+keywords or numbers, then the <<query-dsl-multi-match-query,`multi_match`>>
+query may be a better fit. It supports both text and non-text fields, and
+accepts text fields that do not share the same analyzer.
+
+`multi_match` takes a field-centric view of the query by default. In contrast,
+`combined_fields` is term-centric: `operator` and `minimum_should_match` are
+applied per-term, instead of per-field. Concretely, a query like
+
+[source,console]
+--------------------------------------------------
+GET /_search
+{
+  "query": {
+    "combined_fields" : {
+      "query":      "database systems",
+      "fields":     [ "title", "abstract"],
+      "operator":   "and"
+    }
+  }
+}
+--------------------------------------------------
+
+is executed as
+
+    +(combined("database", fields:["title" "abstract"]))
+    +(combined("systems", fields:["title", "abstract"]))
+
+In other words, all terms must be present in at least one field for a
+document to match.
+
+[NOTE]
+.Custom similarities
+===================================================
+The `combined_fields` query currently only supports the `BM25` similarity
+(which is the default unless a <<index-modules-similarity, custom similarity>>
+is configured). <<similarity, Per-field similarities>> are also not allowed.
+Using `combined_fields` in either of these cases will result in an error.
+===================================================
diff --git a/docs/reference/query-dsl/full-text-queries.asciidoc b/docs/reference/query-dsl/full-text-queries.asciidoc
@@ -1,9 +1,9 @@
 [[full-text-queries]]
 == Full text queries
 
-The full text queries enable you to search <<analysis,analyzed text fields>> such as the 
-body of an email. The query string is processed using the same analyzer that was applied to 
-the field during indexing. 
+The full text queries enable you to search <<analysis,analyzed text fields>> such as the
+body of an email. The query string is processed using the same analyzer that was applied to
+the field during indexing.
 
 The queries in this group are:
 
@@ -21,13 +21,16 @@ the last term, which is matched as a `prefix` query
 
 <<query-dsl-match-query-phrase,`match_phrase` query>>::
 Like the `match` query but used for matching exact phrases or word proximity matches.
-    
+
 <<query-dsl-match-query-phrase-prefix,`match_phrase_prefix` query>>::
 Like the `match_phrase` query, but does a wildcard search on the final word.
-  
+
 <<query-dsl-multi-match-query,`multi_match` query>>::
 The multi-field version of the `match` query.
 
+<<query-dsl-combined-fields-query,`combined_fields` query>>::
+Matches over multiple fields as if they had been indexed into one combined field.
+
 <<query-dsl-query-string-query,`query_string` query>>::
 Supports the compact Lucene <<query-string-syntax,query string syntax>>,
 allowing you to specify AND|OR|NOT conditions and multi-field search
@@ -48,8 +51,10 @@ include::match-phrase-query.asciidoc[]
 
 include::match-phrase-prefix-query.asciidoc[]
 
+include::combined-fields-query.asciidoc[]
+
 include::multi-match-query.asciidoc[]
 
 include::query-string-query.asciidoc[]
 
-include::simple-query-string-query.asciidoc[]
+include::simple-query-string-query.asciidoc[]
diff --git a/docs/reference/query-dsl/multi-match-query.asciidoc b/docs/reference/query-dsl/multi-match-query.asciidoc
@@ -192,7 +192,10 @@ This query is executed as:
 In other words, *all terms* must be present *in a single field* for a document
 to match.
 
-See <<type-cross-fields>> for a better solution.
+The <<query-dsl-combined-fields-query, `combined_fields`>> query offers a
+term-centric approach that handles `operator` and `minimum_should_match` on a
+per-term basis. The other multi-match mode <<type-cross-fields>> also
+addresses this issue.
 
 ===================================================
 
@@ -385,6 +388,12 @@ explanation:
 Also, accepts `analyzer`, `boost`, `operator`, `minimum_should_match`,
 `lenient` and `zero_terms_query`.
 
+WARNING: The `cross_fields` type blends field statistics in a way that does
+not always produce well-formed scores (for example scores can become
+negative). As an alternative, you can consider the
+<<query-dsl-combined-fields-query,`combined_fields`>> query, which is also
+term-centric but combines field statistics in a more robust way.
+
 [[cross-field-analysis]]
 ===== `cross_field` and analysis