-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Extensible Completion Postings Formats #111494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extensible Completion Postings Formats #111494
Conversation
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Mapper mapper = mapperService.mappingLookup().getMapper(field); | ||
if (mapper instanceof CompletionFieldMapper) { | ||
return CompletionFieldMapper.postingsFormat(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst asked me to clean this up to work similarly to getKnnVectorsFormatForField
below. Now, the CompletionFieldMapper
is solely responsible for exposing it's own postings format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that this is indeed easier to follow. I have one minor suggestion for improvement: would it make sense to move the loading of the codec name etc. also to this class? Given it's just static code, I wonder if we can have it all in a single place. The main reason why I would do so is that we hardcode Competion912
, and I am anxious that changes may happen in Lucene and we may forget to update this string. It is probably easier to miss if it's directly in PerFieldFormatSupplier. We can also make this change later, it's not a big deal. This is a pre-existing problem and the code looks better with your change already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to following up with moving the contents of CompletionFieldMapper.postingsFormat()
into here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I have some confusion as the the purview of each class with respect to who ultimately is the source of truth for the format. To me it seems like the mapper should own it, though I see mixed examples of ownership here. I however agree it's cleaner to just move it here, so I'll do that.
server/src/main/java/org/elasticsearch/index/mapper/CompletionFieldMapper.java
Outdated
Show resolved
Hide resolved
…ent/make-fst-on-heap-configurable
Hi @JVerwolf, I've created a changelog YAML for you. |
@elasticmachine run elasticsearch-ci/validate-changelogs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a rolling upgrade test that will exercise this code ?
We need something that indexes with old format.
Indexes with a mixed cluster and merges some segments.
Indexes with new and merges some segments.
Thanks for the review @benwtrent ! Good points, though IIUC, I think this applies more to code that will use this PR (that will provided another codec), rather than to this code, per se. Given that I do have a PR (in a private repo) that uses this, I'll work out how to add rolling upgrade tests there. WDYT? |
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, I left a couple of small comments.
Mapper mapper = mapperService.mappingLookup().getMapper(field); | ||
if (mapper instanceof CompletionFieldMapper) { | ||
return CompletionFieldMapper.postingsFormat(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that this is indeed easier to follow. I have one minor suggestion for improvement: would it make sense to move the loading of the codec name etc. also to this class? Given it's just static code, I wonder if we can have it all in a single place. The main reason why I would do so is that we hardcode Competion912
, and I am anxious that changes may happen in Lucene and we may forget to update this string. It is probably easier to miss if it's directly in PerFieldFormatSupplier. We can also make this change later, it's not a big deal. This is a pre-existing problem and the code looks better with your change already.
server/src/main/java/org/elasticsearch/internal/CompletionsPostingsFormatExtension.java
Show resolved
Hide resolved
org.elasticsearch.serverless.constants; | ||
org.elasticsearch.serverless.constants, | ||
org.elasticsearch.serverless.codec, | ||
org.elasticsearch.stateless; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you help me understand why we need to add org.elasticsearch.serverless.codec
here as well and below, and why we need to add org.elasticsearch.stateless
here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, org.elasticsearch.serverless.codec
and org.elasticsearch.stateless
are separate java modules. Each requires access to the CompletionsPostingsFormatExtension
found in the org.elasticsearch.internal
java module.
@rjernst mind checking this as well to ensure this is correct? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe org.elasticsearch.stateless
should not be needed here, but codec is so that it can implement the internal spi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rjernst. I'll remove the export to stateless
now that we are no longer using the Feature (for which this was previously required).
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
org.elasticsearch.serverless.constants; | ||
org.elasticsearch.serverless.constants, | ||
org.elasticsearch.serverless.codec, | ||
org.elasticsearch.stateless; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe org.elasticsearch.stateless
should not be needed here, but codec is so that it can implement the internal spi.
Mapper mapper = mapperService.mappingLookup().getMapper(field); | ||
if (mapper instanceof CompletionFieldMapper) { | ||
return CompletionFieldMapper.postingsFormat(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to following up with moving the contents of CompletionFieldMapper.postingsFormat()
into here.
…ent/make-fst-on-heap-configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
@elasticmachine run elasticsearch-ci/part-3 |
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See #111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
This PR allows the completion posting format loaded by ES to be extensible.
Lucene uses
PostingsFormat.forName(String)
to load a postings format identified by the String arg. This PR provides an extension that allows alternative names to be provided at the point we load postings format in ES at write-time. This name is then written to the shard info. When the shard is read at search-time, Lucene looks for postings formats with the same type on the classpath (as declared though module info via aprovides
directive), then picks the one with the same name.