Skip to content

Make it easier to optimize search with better analysis #27049

Closed
@jpountz

Description

@jpountz

We have a number of filters that can help make search faster:

  • shingles for faster phrases
  • ngrams for infix search
  • edge n-grams for prefix/suffix search

Yet leveraging them to improve search speed typically makes Elasticsearch much harder to use since query parsers are not aware of whether these filters are in use.

To give the example of prefix search, I'm wondering whether we should add a MappedFieldType.prefixQuery factory method that would be called by query parsers. Regular text fields would still create a PrefixQuery but we could have a new field type that would be optimized for prefix search which would automatically add a filter to the analysis chain at index time. It would be like the edge n-gram filter except that it would add a marker to differenciate prefixes from actual terms. For instance if we want to optimize prefix queries for prefixes that would be up to 4 chars, we could analyze foobar as [foobar, \0f, \0fo, \0foo, \0foob]. I'm using \0 here but anything that can help differenciate prefixes from the original term while preventing collisions would work.

Then at search time, MappedFieldType.prefixQuery would look at the length of the term and prepend a \0 and run a term query if there are 4 chars or less, and run a regular PrefixQuery otherwise.

We could do the same for infix search or phrase queries using similar ideas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search/SearchSearch-related issues that do not fall into other categories>feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions