Description
We have a number of filters that can help make search faster:
- shingles for faster phrases
- ngrams for infix search
- edge n-grams for prefix/suffix search
Yet leveraging them to improve search speed typically makes Elasticsearch much harder to use since query parsers are not aware of whether these filters are in use.
To give the example of prefix search, I'm wondering whether we should add a MappedFieldType.prefixQuery
factory method that would be called by query parsers. Regular text
fields would still create a PrefixQuery
but we could have a new field type that would be optimized for prefix search which would automatically add a filter to the analysis chain at index time. It would be like the edge n-gram filter except that it would add a marker to differenciate prefixes from actual terms. For instance if we want to optimize prefix queries for prefixes that would be up to 4 chars, we could analyze foobar
as [foobar
, \0f
, \0fo
, \0foo
, \0foob
]. I'm using \0
here but anything that can help differenciate prefixes from the original term while preventing collisions would work.
Then at search time, MappedFieldType.prefixQuery
would look at the length of the term and prepend a \0
and run a term
query if there are 4 chars or less, and run a regular PrefixQuery
otherwise.
We could do the same for infix search or phrase queries using similar ideas.