Skip to content

Reformats term vectors APIs #47484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 5, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 69 additions & 16 deletions docs/reference/docs/multi-termvectors.asciidoc
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
[[docs-multi-termvectors]]
=== Multi termvectors API
=== Multi term vectors API
++++
<titleabbrev>Multi term vectors</titleabbrev>
++++

Multi termvectors API allows to get multiple termvectors at once. The
documents from which to retrieve the term vectors are specified by an index and id.
But the documents could also be artificially provided in the request itself.

The response includes a `docs`
array with all the fetched termvectors, each element having the structure
provided by the <<docs-termvectors,termvectors>>
API. Here is an example:
Retrieves multiple term vectors with a single request.

[source,console]
--------------------------------------------------
Expand All @@ -32,10 +28,64 @@ POST /_mtermvectors
--------------------------------------------------
// TEST[setup:twitter]

See the <<docs-termvectors,termvectors>> API for a description of possible parameters.
[[docs-multi-termvectors-api-request]]
==== {api-request-title}

`POST /_mtermvectors`

`POST /<index>/_mtermvectors`

[[docs-multi-termvectors-api-desc]]
==== {api-description-title}

You can specify existing documents by index and ID or
provide artificial documents in the body of the request.
The index can be specified the body of the request or in the request URI.

The response contains a `docs` array with all the fetched termvectors.
Each element has the structure provided by the <<docs-termvectors,termvectors>>
API.

See the <<docs-termvectors,termvectors>> API for more information about the information
that can be included in the response.

[[docs-multi-termvectors-api-path-params]]
==== {api-path-parms-title}

`<index>`::
(Optional, string) Name of the index that contains the documents.

[[docs-multi-termvectors-api-query-params]]
==== {api-query-parms-title}

include::{docdir}/rest-api/common-parms.asciidoc[tag=fields]

include::{docdir}/rest-api/common-parms.asciidoc[tag=field_statistics]

The `_mtermvectors` endpoint can also be used against an index (in which case it
is not required in the body):
include::{docdir}/rest-api/common-parms.asciidoc[tag=offsets]

include::{docdir}/rest-api/common-parms.asciidoc[tag=payloads]

include::{docdir}/rest-api/common-parms.asciidoc[tag=positions]

include::{docdir}/rest-api/common-parms.asciidoc[tag=preference]

include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]

include::{docdir}/rest-api/common-parms.asciidoc[tag=realtime]

include::{docdir}/rest-api/common-parms.asciidoc[tag=term_statistics]

include::{docdir}/rest-api/common-parms.asciidoc[tag=version]

include::{docdir}/rest-api/common-parms.asciidoc[tag=version_type]

[float]
[[docs-multi-termvectors-api-example]]
==== {api-examples-title}

If you specify an index in the request URI, the index does not need to be specified for each documents
in the request body:

[source,console]
--------------------------------------------------
Expand All @@ -57,7 +107,8 @@ POST /twitter/_mtermvectors
--------------------------------------------------
// TEST[setup:twitter]

If all requested documents are on same index and also the parameters are the same, the request can be simplified:
If all requested documents are in same index and the parameters are the same, you can use the
following simplified syntax:

[source,console]
--------------------------------------------------
Expand All @@ -74,9 +125,11 @@ POST /twitter/_mtermvectors
--------------------------------------------------
// TEST[setup:twitter]

Additionally, just like for the <<docs-termvectors,termvectors>>
API, term vectors could be generated for user provided documents.
The mapping used is determined by `_index`.
[[docs-multi-termvectors-artificial-doc]]
===== Artificial documents

You can also use `mtermvectors` to generate term vectors for _artificial_ documents provided
in the body of the request. The mapping used is determined by the specified `_index`.

[source,console]
--------------------------------------------------
Expand Down
101 changes: 72 additions & 29 deletions docs/reference/docs/termvectors.asciidoc
Original file line number Diff line number Diff line change
@@ -1,38 +1,47 @@
[[docs-termvectors]]
=== Term Vectors
=== Term vectors API
++++
<titleabbrev>Term vectors</titleabbrev>
++++

Returns information and statistics on terms in the fields of a particular
document. The document could be stored in the index or artificially provided
by the user. Term vectors are <<realtime,realtime>> by default, not near
realtime. This can be changed by setting `realtime` parameter to `false`.
Retrieves information and statistics for terms in the fields of a particular document.

[source,console]
--------------------------------------------------
GET /twitter/_termvectors/1
--------------------------------------------------
// TEST[setup:twitter]

Optionally, you can specify the fields for which the information is
retrieved either with a parameter in the url
[[docs-termvectors-api-request]]
==== {api-request-title}

`GET /<index>/_termvectors/<_id>`

[[docs-termvectors-api-desc]]
==== {api-description-title}

You can retrieve term vectors for documents stored in the index or
for _artificial_ documents passed in the body of the request.

You can specify the fields you are interested in through the `fields` parameter,
or by adding the fields to the request body.

[source,console]
--------------------------------------------------
GET /twitter/_termvectors/1?fields=message
--------------------------------------------------
// TEST[setup:twitter]

or by adding the requested fields in the request body (see
example below). Fields can also be specified with wildcards
in similar way to the <<query-dsl-multi-match-query,multi match query>>
Fields can be specified using wildcards, similar to the <<query-dsl-multi-match-query,multi match query>>.

[float]
==== Return values
Term vectors are <<realtime,real-time>> by default, not near real-time.
This can be changed by setting `realtime` parameter to `false`.

Three types of values can be requested: _term information_, _term statistics_
You can request three types of values: _term information_, _term statistics_
and _field statistics_. By default, all term information and field
statistics are returned for all fields but no term statistics.
statistics are returned for all fields but term statistics are excluded.

[float]
[[docs-termvectors-api-term-info]]
===== Term information

* term frequency in the field (always returned)
Expand All @@ -52,7 +61,7 @@ should make sure that the string you are taking a sub-string of is also encoded
using UTF-16.
======

[float]
[[docs-termvectors-api-term-stats]]
===== Term statistics

Setting `term_statistics` to `true` (default is `false`) will
Expand All @@ -65,7 +74,7 @@ return
By default these values are not returned since term statistics can
have a serious performance impact.

[float]
[[docs-termvectors-api-field-stats]]
===== Field statistics

Setting `field_statistics` to `false` (default is `true`) will
Expand All @@ -77,8 +86,8 @@ omit :
* sum of total term frequencies (the sum of total term frequencies of
each term in this field)

[float]
===== Terms Filtering
[[docs-termvectors-api-terms-filtering]]
===== Terms filtering

With the parameter `filter`, the terms returned could also be filtered based
on their tf-idf scores. This could be useful in order find out a good
Expand All @@ -105,7 +114,7 @@ The following sub-parameters are supported:
`max_word_length`::
The maximum word length above which words will be ignored. Defaults to unbounded (`0`).

[float]
[[docs-termvectors-api-behavior]]
==== Behaviour

The term and field statistics are not accurate. Deleted documents
Expand All @@ -116,8 +125,45 @@ whereas the absolute numbers have no meaning in this context. By default,
when requesting term vectors of artificial documents, a shard to get the statistics
from is randomly selected. Use `routing` only to hit a particular shard.

[float]
===== Example: Returning stored term vectors
[[docs-termvectors-api-path-params]]
==== {api-path-parms-title}

`<index>`::
(Required, string) Name of the index that contains the document.

`<_id>`::
(Optional, string) Unique identifier of the document.

[[docs-termvectors-api-query-params]]
==== {api-query-parms-title}

include::{docdir}/rest-api/common-parms.asciidoc[tag=fields]

include::{docdir}/rest-api/common-parms.asciidoc[tag=field_statistics]

include::{docdir}/rest-api/common-parms.asciidoc[tag=offsets]

include::{docdir}/rest-api/common-parms.asciidoc[tag=payloads]

include::{docdir}/rest-api/common-parms.asciidoc[tag=positions]

include::{docdir}/rest-api/common-parms.asciidoc[tag=preference]

include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]

include::{docdir}/rest-api/common-parms.asciidoc[tag=realtime]

include::{docdir}/rest-api/common-parms.asciidoc[tag=term_statistics]

include::{docdir}/rest-api/common-parms.asciidoc[tag=version]

include::{docdir}/rest-api/common-parms.asciidoc[tag=version_type]

[[docs-termvectors-api-example]]
==== {api-examples-title}

[[docs-termvectors-api-stored-termvectors]]
===== Returning stored term vectors

First, we create an index that stores term vectors, payloads etc. :

Expand Down Expand Up @@ -259,8 +305,8 @@ Response:
// TEST[continued]
// TESTRESPONSE[s/"took": 6/"took": "$body.took"/]

[float]
===== Example: Generating term vectors on the fly
[[docs-termvectors-api-generate-termvectors]]
===== Generating term vectors on the fly

Term vectors which are not explicitly stored in the index are automatically
computed on the fly. The following request returns all information and statistics for the
Expand All @@ -281,8 +327,7 @@ GET /twitter/_termvectors/1
// TEST[continued]

[[docs-termvectors-artificial-doc]]
[float]
===== Example: Artificial documents
===== Artificial documents

Term vectors can also be generated for artificial documents,
that is for documents not present in the index. For example, the following request would
Expand All @@ -304,7 +349,6 @@ GET /twitter/_termvectors
// TEST[continued]

[[docs-termvectors-per-field-analyzer]]
[float]
====== Per-field analyzer

Additionally, a different analyzer than the one at the field may be provided
Expand Down Expand Up @@ -369,8 +413,7 @@ Response:


[[docs-termvectors-terms-filtering]]
[float]
===== Example: Terms filtering
===== Terms filtering

Finally, the terms returned could be filtered based on their tf-idf scores. In
the example below we obtain the three most "interesting" keywords from the
Expand Down
Loading