-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ContextSuggester #4044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ContextSuggester #4044
Conversation
haven't reviewed the code, API wise it looks good to me, but can we also have as part of this pull request the relevant changes to our docs? |
@kimchy of course |
A couple of comments based on the documetnation: For mapping:
I don't think we should expose the separator? What is the reason this is an implementation detail and should not be exposed to the user. On the request side of things I don't think we should require folks to specify stuff like this:
it should rather look like this:
Instead, since we know how to parse the values ie. we know from the mapping that it is a field or geo?! For the mapping of
Then you can decide if you want to specify it manually like this:
or via a field:
and since we defined the default value in the mapping we can still have a context if the |
I left some comments on the code, looks great but I think we can simplify the API as I described above! If you push changes don't rebase your branch with master or so please just add the relevant changes add, commit and push the branch to your github repo so we can see them here. thanks !! |
Hiya I've got some comments about how to improve the API. There is no mention of what happens if a
I probably would also like to combine multiple fields to generate a context, eg:
Specifying this via the API could make things messy and overly complicated. The simple answer to this would be to concatenate these values into one field. I'd also like it if we could accept scripts which can generate a context value (or array of values) based on the _source. So we have:
The mappings should look like the following: Mapping for valueNo default value set:
Default value set:
Should Mapping for field
Mapping for script
Mapping for geoBy specifying (btw, Geo as value:
Geo as a field:
Although we could derive the geohash stuff from the referenced field, for consistency's sake, I think I prefer actually specifying it. All it needs is Geo as a script:
Indexing a documentSpecifying the
Derives value from :
Can have multiple contexts, eg:
This should generate two suggestion paths Searching:There is no need to specify geo etc, as we can figure that out from the mapping, so searching just looks like:
|
Thanks a lot for your suggestions. I had already started implementing the new style of the API and the multivalued contexts. |
Hey @simon, I think we should keep the |
@clintongormley I like the idea of using multivalued fields as context. But I think we have two options here:
The first option requires an order to create an unique context. The second option will simply generate alternative contexts. I think the later option should be used. But I like to hear your idea of those contexts. |
@chilling I think we should use a less promient character to separate to begin with like Can you reply to all the code comments etc. as well please. |
@s1monw, you're right. I think this will be an acceptable solution. |
I just managed to push a clean update. So for now we support multiple values and multiple context in the context suggester. Currently I'm working on the documentation but feel free to have a first look at the current implementation. |
public class PrefixAnalyzer extends Analyzer { | ||
|
||
public static final char DEFAULT_SEPARATOR = '|'; | ||
public static final char DEFAULT_DELIMITER = '-'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFAULT_DELIMITER
is unused?
hey florian, I took a loot at the PR and I think it looks good though. I wonder a bit how the API (REST) looks like at this point. Can you add documentation to the PullRequest so we can also see the API from a REST perspective. It's kind of hard to tell from the tests etc. :) |
Thanks @s1monw. I already started working on docs. So documentation will be there tomorrow. |
@s1monw for now I worked on your code review and changed the most of the thinks. What keeps me busy right now, is the fuzzy logic. I think I'm not able to adjust the Mapping
Index
Query
|
Wait, you are only using exactly one prefix to lookup the suggestion since our interface doesn't allow multiple. At index time is a different story. But at serach time you should be able to tell how long the prefix is? |
-------------------------------------------------- | ||
|
||
=== Geo location Context | ||
A geo location as context information works slightly different from the other queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A geo
context allows you to limit results to those that lie within a certain distance of a specified geolocation. At index time, a lat/long geopoint is converted into a geohash of a certain precision
, which provides the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if we could specify multiple precisions, eg:
precision: [ "5m", "1km", "10km"]
Then we'd need to be able to specify the precision at search time as well.
Hi @chilling I've finished reviewing the docs. I think there are improvements we can make to the API:
|
hi @clintongormley, for now I worked in most of your comments. Currently I actually working on the multiple precisions. I makes completely sense. My basic idea in the initial commit was to index the whole geohash path. Namely every prefix. Since this obviously caused trouble I removed this. But the idea of having at least a precision at the query, will solve the problem. Do you think it makes sense to index all prefixes of a geohash and the just query by a certain precision? |
Hi @chilling
That makes sense to me, but I have a feeling that it may generate very large FSTs given that you essentially add all the data 12 times. If it were possible to compress the FST that would be awesome. Failing that, I think the list of precisions is the best compromise. |
Hi @s1monw, @clintongormley and @spinscale, just finished the next iteration. Maybe you like to have a look, I hope we're getting closer. |
code changes look good to me - @clintongormley can you check if your comments / concerns were addressed? |
@@ -171,7 +171,10 @@ Kilometer:: `km` or `kilometers` | |||
Meter:: `m` or `meters` | |||
Centimeter:: `cm` or `centimeters` | |||
Millimeter:: `mm` or `millimeters` | |||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've committed the conflict markers here
Hiya @chilling I would combine the docs for category and field, as I mentioned in the comment above. That, plus fixing the bad merge with the conflict markers <<<<< and I'm +1 |
thanks @clintongormley! Can you have a last short look at this? |
LGTM |
================ This commit extends the `CompletionSuggester` by context informations. In example such a context informations can be a simple string representing a category reducing the suggestions in order to this category. Three base implementations of these context informations have been setup in this commit. - a Category Context - a Geo Context All the mapping for these context informations are specified within a context field in the completion field that should use this kind of information.
merged |
moving to 1.2 due to #5525 |
This commit extends the
CompletionSuggester
by contextinformations. In example such a context informations can
be a simple string representing a category reducing the
suggestions in order to this category.
Three base implementations of these context informations
have been setup in this commit.
All the mapping for these context informations are
specified within a context field in the completion
field that should use this kind of information.
Mapping Example
The following example shows the mapping for a GeoContext.
Indexing
During indexing a document the subfield
context
of thecompletion field contains the data to be used in order
to provide suggestions.
Suggestion Example
The Suggestion request is extended by a context value. The
suggest request for a geolocation looks like
The context objects contains a field with the same name as
defined in the mapping. According to the type of the context
this field contains the data associated with the suggestion
request. In this example the geohash of a location.
Category Context
The simplest way to use this feature is a category context. It
supports a arbitrary name of a category to use with the completion
suggest API.
To set the context support to the category type this option must be
set to
true
:The name of the context category then needs to be set within the
suggestion context during indexing:
and can be used by setting the category value:
Field Context
The Field Context works like the category context but the value of this
will context will not explicitly be set. It refers to another field in
the document. In example a
category
field.for indexing the field context must be set to true:
and suggestions use the
context.field
valueGeo Context
The last context feature is the GeoContext. It take a location into account.
For example if one searches for delivery services it might be use full to find
results around the location the query was sent. This context internally works
on geohashes only but the REST API allows any form defined for
geo_points
In the mapping this kind of context is configured by two parameters:
The
precision
option is used to configure the range of result. If theneighbors
option is enabled not only the given geohash cell will be usedbut also all it's neighbors.
The context during indexing is set to the location of the input:
To get a ist of suggestions around a specific area the
context.geo
fieldmust contain the position of this area:
Closes #3959