Skip to content

Completion Suggester V2 #10746

Closed
Closed
@areek

Description

@areek

Completion Suggester V2

The completion suggester provides auto-complete/search-as-you-type functionality.
This is a navigational feature to guide users to relevant results as they are typing, improving search precision. It is not meant for spell correction or did-you-mean functionality like the term or phrase suggesters.

The completions are indexed as a weighted FST (finite state transducer) to provide fast Top N prefix-based
searches suitable for serving relevant results as a user types.

Notable Features:

  • Document oriented suggestions:
    • Near-real time.
    • Deleted document filtering.
    • Multiple Context support.
    • Return document field values via payload.
  • Query Interface:
    • Regular expression support via regex.
    • Typo tolerance via fuzzy.
    • Context boosting at query time.

Completion Suggester V2 is based on LUCENE-6339 and LUCENE-6459, the first iteration of Lucene's new suggest API.

Mapping

The completion fields are indexed in a special way, hence a field mapping has to be defined.
Following shows a field mapping for a completion field named title_suggest:

PUT {INDEX_NAME}
{
 "mappings": {
  {TYPE_NAME}: {
   "properties": {
    "title_suggest": {
     "type": "completion"
   }
  }
 }
}

You can choose index and search time analyzer for the completion field by adding analyzer
and search_analyzer options.

Context Mappings

Adding a contexts option in the field mapping defines a context-enabled completion field. You may want
a context-enabled completion field, if you require filtering or boosting suggestions by a criteria other than
just its prefix. Note that adding high-cardinality context values will increase the size of the in-memory
index significantly.

There are two types of supported context types: category and geo.

Category Context Mapping

Category contexts are indexed as prefixes to the completion field value.

The following adds a category context named genre:

...
"contexts": [
 {
   "name": "genre",
   "type": "category"
 }
]

You can also pull context values from another field in a document by using a path option specifying the field name.

Geo Context Mapping

Geo points are encoded as geohash strings and prefixed to the completion field value.
The following adds a geo context named location:

...
"contexts": [
 {
   "name": "location",
   "type": "geo"
 }
]

You can also set precision option to choose the geohash length and path to pull context values from another
field in the document.

Indexing

Just like any other field, you can add multiple completion fields to a document. You can also index multiple completions
for a completion field per document. Each completion value is tied to its document and can be assigned an index-time
weight, which determines its relative rank among other completion values which share a common prefix.

The following indexes a completion value and its weight for the title_suggest completion field:

POST {INDEX_NAME}/{TYPE_NAME}
{
 "title_suggest": {
  "input": "title1",
  "weight": 7
 }
}

You can use the short-form, if you prefer not to add weight to the completions:

POST {INDEX_NAME}/{TYPE_NAME}
{
 "title_suggest": "title1",
}

Arrays are also supported to index multiple values,

The following indexes multiple completion entries (input and weight) for a single document:

POST {INDEX_NAME}/{TYPE_NAME}
{
 "title_suggest": [
  {
   "input": "title1",
   "weight": 14
  },
  {
   "input": "alternate_title",
   "weight": 7
  }
 ]
}

Indexing context-enabled fields

You can use the path option previously mentioned to pull context values from another field
in the document or add contexts option to the completion entry while indexing.

The following explicitly indexes context values along with completions:

POST {INDEX_NAME}/{TYPE_NAME}
{
 "genre_title_suggest": {
  "input": "title1",
  "contexts": {
   "genre": ["genre1", "genre2"]
  },
  "weight": 7
 }
}

You can also configure the path option in the context mapping to pull values from another
field as follows (assuming path for the genre context has been set to genre field):

POST {INDEX_NAME}/{TYPE_NAME}
{
 "genre_title_suggest": "title1",
 "genre": ["genre1", "genre2"]
}

Query Interface

The point of indexing values as completions is to be able to run fast prefix-based searches on them.
You can run Prefix, Fuzzy and Regex queries on all completion fields. In case of a context-
enabled completion field, providing no context indicates all contexts will be considered. But you
can not run a Context query on a completion field with no contexts. When a query is run on a context-
enabled field, the contexts for a completion is returned with the suggestion.

Prefix Query

The following suggests completions from the field title_suggest that start with titl:

POST {INDEX}/_suggest
{
 "suggest-namespace" : {
  "prefix" : "titl",
  "completion" : {
   "field" : "title_suggest"
  }
 }
}

The suggestions are sorted by their index-time weight.

Fuzzy Prefix Query

A fuzzy prefix query can serve typo-tolerant suggestions. It scores suggestions closer (based on its edit distance)
to the provided prefix higher, regardless of their weight.

POST {INDEX}/_suggest
{
 "suggest-namespace" : {
  "prefix" : "sug",
  "completion" : {
   "field" : "suggest",
   "fuzzy" : {        (1)
    "fuzziness" : 2
   }
  }
 }
}

Specify fuzzy as shown in (1) to use typo-tolerant suggester. Full options for fuzzy

Regex Prefix Query

A regex prefix query matches all the term prefixes that match a regular expression. Regex is anchored at the begining but not at the end.
The suggestions are sorted by their index-time weight.

POST {INDEX}/_suggest
{
 "suggest-namespace" : {
  "regex" : "s[u|a]g",    (1)
  "completion" : {
   "field" : "suggest"
  }
 }
}

Specify regex as shown in (1), instead of prefix to use regular expressions. Supported regular expression syntax

Context Query

Adding contexts (1) option to the query enables filtering and/or boosting suggestions based on their context values.
This query scores suggestions by multiplying the query-time boost withe the suggestion weight.

POST {INDEX}/_suggest
{
 "suggest-namespace" : {
  "prefix" : "sug",
  "completion" : {
   "field" : "genre_title_suggest",
   "contexts": {           (1)
    "genre": [
     {
      "value" : "rock", 
      "boost" : 3
     },
     {
      "value" : "indie",
      "boost" : 2
     }
    ]
   }
  }
 }
}

The contexts can also be specified without any boost:

  ...
  "contexts": {
    "genre" : ["rock", "indie"]
  }

Geo Context Query:

The result will be scored such that the suggestions are first sorted by the distance between the corresponding geo context and the provided
geo location and then by the weight of the suggestions.

  ...
  "contexts" : {
    "location" : {
      "context" : {
        "lat" : ..,
        "lon" : ..
      },
      "precision" : ..
    }
  }

Example

The following performs a Fuzzy Prefix Query combined with a Context Query on a context-enabled completion field named genre_song_suggest.

POST {INDEX}/_suggest
{
 "suggest-namespace" : {
  "prefix" : "like a roling st",
  "completion" : {
   "field" : "genre_song_suggest",
   "fuzzy" : {
    "fuzziness" : 2
   },
   "contexts" : {
    "genre": [
     {
      "context" : "rock", 
      "boost" : 3
     },
     {
      "context" : "indie",
      "boost" : 2
     }
    ]
   }
  }
 }
}

This query will return all song names for the genre rock and indie that are within an edit distance of 2 from the prefix like a roling st.
The song names with genre of rock will be boosted higher then that of indie.
The completion field values that share the longest prefix with like a roling st will be additionally boosted higher.

Payload

You can retrieve any document field values along with its completions using the payload option.
The following returns the url field with each suggestion entry:

POST {INDEX}/_suggest
{
 "suggest-namespace" : {
  "prefix" : "titl",
  "completion" : {
   "field" : "title_suggest",
   "payload" : ["url"]
  }
 }
}

The response format is as follows:

{
 ...
 "suggest-namespace" : [ 
  {
   "prefix" : "sugg",
   "offset" : 0,
   "length" : 4,
   "options" : [ 
    {
     "text" : "suggestion",
     "score" : 34.0, 
     "payload": {
       "url" : [ "url_1" ]
     }
    }
   ]
  } 
 ]
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions