-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Terms enum for version fields #93839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💚 CLA has been signed |
Pinging @elastic/es-search (Team:Search) |
Hi @cbuescher, I've created a changelog YAML for you. |
The _terms_enum API currently only supports keyword, constant_keyword and flattened field types. This change adds support for the `version` field type that sorts according to the semantic versioning definition. Closes elastic#83403
c34c637
to
261545a
Compare
Hi @cbuescher, I've created a changelog YAML for you. |
I think this can be a lot simpler if we retrieve terms from the doc values terms dictionary via |
@romseygeek thanks for looking into this. I'm afraid we can't get away without the encoding step, that is what I initially tried and failed. We store both the indexed and the doc-value termes in their encoded form specific to the version field which preserves semantic version type ordering. We somehow have to convert that back to a string representation, and if we do that too early we loose the ordering e.g. when merging shard level terms enum results in MultiShardTermsEnum. Happy to take another look though. |
Ah ok, yes I see. I wonder if it's possible to re-use something like |
Great idea, will look into that. |
@@ -359,15 +361,15 @@ protected NodeTermsEnumResponse dataNodeOperation(NodeTermsEnumRequest request, | |||
request.searchAfter() | |||
); | |||
if (terms != null) { | |||
shardTermsEnums.add(terms); | |||
shardTermsEnums.add(new ShardTermsEnum(terms, mappedFieldType::valueForDisplay)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to maintain the connection from a shard level enumeration to the field type that generated it since we can ask for terms enumerations across indices that have mixed type mappings for the same field name (i.e. keyword for older indices, version for newer ones).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's much nicer. I left a couple more suggestions.
return current; | ||
} | ||
|
||
public long docFreq() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to adjust the javadoc at the top of the class to not refer to this anymore?
* @param enums TermsEnums from shards which we should merge | ||
* @throws IOException Errors accessing data | ||
**/ | ||
public MultiShardTermsEnum(TermsEnum[] enums) throws IOException { | ||
public MultiShardTermsEnum(TransportTermsEnumAction.ShardTermsEnum[] enums) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be neater as a builder object, and then ShardTermsEnum can become private to MultiShardTermsEnum. So the caller is something like:
MultiShardTermsEnum.Builder termsBuilder = new MultiShardTermsEnum.Builder();
for (ShardId shardId : request.shardIds()) {
...
terms.add(fieldType.getTerms( ... ), fieldType::valueForDisplay)
...
}
MultiShardTermsEnum terms = termsBuilder.build();
@romseygeek thanks, great suggestions, I pushed an update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @cbuescher
A recent change (elastic#93839) introduced _terms_enum support for fields that need to convert their internal bytes representation to thr proper string representation for display purposes. The constant_keyword and flattened field types didn't implement a 'valueForDisplay' method yet, so the underlying BytesRef was printed directly in the response. Closes elastic#94041
A recent change (#93839) introduced `_terms_enum` support for fields that need to convert their internal bytes representation to thr proper string representation for display purposes. The `constant_keyword` and `flattened` field types didn't implement a 'valueForDisplay' method yet, so the underlying `BytesRef` was printed directly in the response. This change fixes that and adds tests to ensure human readable response values for those field types. Closes #94041
The
_terms_enum
API currently supports keyword, constant_keyword andflattened fields.
This change adds this support for the
version
field type as well, as it wasoriginally intended to be a specialization of the keyword field for handling
software version values.
Closes #83403