Skip to content

Extensible Completion Postings Formats #111494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

JVerwolf
Copy link
Contributor

This PR allows the completion posting format loaded by ES to be extensible.

Lucene uses PostingsFormat.forName(String) to load a postings format identified by the String arg. This PR provides an extension that allows alternative names to be provided at the point we load postings format in ES at write-time. This name is then written to the shard info. When the shard is read at search-time, Lucene looks for postings formats with the same type on the classpath (as declared though module info via a provides directive), then picks the one with the same name.

@JVerwolf JVerwolf added :Search Foundations/Mapping Index mappings, including merging and defining field types :Search/Search Search-related issues that do not fall into other categories labels Jul 31, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch labels Jul 31, 2024
Comment on lines 62 to 64
Mapper mapper = mapperService.mappingLookup().getMapper(field);
if (mapper instanceof CompletionFieldMapper) {
return CompletionFieldMapper.postingsFormat();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst asked me to clean this up to work similarly to getKnnVectorsFormatForField below. Now, the CompletionFieldMapper is solely responsible for exposing it's own postings format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that this is indeed easier to follow. I have one minor suggestion for improvement: would it make sense to move the loading of the codec name etc. also to this class? Given it's just static code, I wonder if we can have it all in a single place. The main reason why I would do so is that we hardcode Competion912 , and I am anxious that changes may happen in Lucene and we may forget to update this string. It is probably easier to miss if it's directly in PerFieldFormatSupplier. We can also make this change later, it's not a big deal. This is a pre-existing problem and the code looks better with your change already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to following up with moving the contents of CompletionFieldMapper.postingsFormat() into here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I have some confusion as the the purview of each class with respect to who ultimately is the source of truth for the format. To me it seems like the mapper should own it, though I see mixed examples of ownership here. I however agree it's cleaner to just move it here, so I'll do that.

@elasticsearchmachine
Copy link
Collaborator

Hi @JVerwolf, I've created a changelog YAML for you.

@JVerwolf
Copy link
Contributor Author

JVerwolf commented Aug 1, 2024

@elasticmachine run elasticsearch-ci/validate-changelogs

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a rolling upgrade test that will exercise this code ?

We need something that indexes with old format.

Indexes with a mixed cluster and merges some segments.

Indexes with new and merges some segments.

@JVerwolf
Copy link
Contributor Author

JVerwolf commented Aug 9, 2024

Thanks for the review @benwtrent ! Good points, though IIUC, I think this applies more to code that will use this PR (that will provided another codec), rather than to this code, per se. Given that I do have a PR (in a private repo) that uses this, I'll work out how to add rolling upgrade tests there. WDYT?

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I left a couple of small comments.

Comment on lines 62 to 64
Mapper mapper = mapperService.mappingLookup().getMapper(field);
if (mapper instanceof CompletionFieldMapper) {
return CompletionFieldMapper.postingsFormat();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that this is indeed easier to follow. I have one minor suggestion for improvement: would it make sense to move the loading of the codec name etc. also to this class? Given it's just static code, I wonder if we can have it all in a single place. The main reason why I would do so is that we hardcode Competion912 , and I am anxious that changes may happen in Lucene and we may forget to update this string. It is probably easier to miss if it's directly in PerFieldFormatSupplier. We can also make this change later, it's not a big deal. This is a pre-existing problem and the code looks better with your change already.

org.elasticsearch.serverless.constants;
org.elasticsearch.serverless.constants,
org.elasticsearch.serverless.codec,
org.elasticsearch.stateless;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you help me understand why we need to add org.elasticsearch.serverless.codec here as well and below, and why we need to add org.elasticsearch.stateless here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, org.elasticsearch.serverless.codec and org.elasticsearch.stateless are separate java modules. Each requires access to the CompletionsPostingsFormatExtension found in the org.elasticsearch.internal java module.

@rjernst mind checking this as well to ensure this is correct? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe org.elasticsearch.stateless should not be needed here, but codec is so that it can implement the internal spi.

Copy link
Contributor Author

@JVerwolf JVerwolf Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rjernst. I'll remove the export to stateless now that we are no longer using the Feature (for which this was previously required).

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

org.elasticsearch.serverless.constants;
org.elasticsearch.serverless.constants,
org.elasticsearch.serverless.codec,
org.elasticsearch.stateless;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe org.elasticsearch.stateless should not be needed here, but codec is so that it can implement the internal spi.

Comment on lines 62 to 64
Mapper mapper = mapperService.mappingLookup().getMapper(field);
if (mapper instanceof CompletionFieldMapper) {
return CompletionFieldMapper.postingsFormat();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to following up with moving the contents of CompletionFieldMapper.postingsFormat() into here.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Nov 14, 2024
@JVerwolf
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-3

@JVerwolf JVerwolf merged commit 8350ff2 into elastic:main Nov 28, 2024
16 checks passed
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2025
A while ago we introduced a completion postings format extension to eventually
be able to customize how completion FSTs are loaded. See elastic#111494.

We have never leveraged this extension, and meanwhile Lucene is moving
to always load FSTs off-heap, and no longer allow on-heap.
See apache/lucene#14364 .

This commit removes the SPI extension as it is no longer needed.
javanna added a commit that referenced this pull request Mar 20, 2025
A while ago we introduced a completion postings format extension to eventually
be able to customize how completion FSTs are loaded. See #111494.

We have never leveraged this extension, and meanwhile Lucene is moving
to always load FSTs off-heap, and no longer allow on-heap.
See apache/lucene#14364 .

This commit removes the SPI extension as it is no longer needed.
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Mar 21, 2025
A while ago we introduced a completion postings format extension to eventually
be able to customize how completion FSTs are loaded. See elastic#111494.

We have never leveraged this extension, and meanwhile Lucene is moving
to always load FSTs off-heap, and no longer allow on-heap.
See apache/lucene#14364 .

This commit removes the SPI extension as it is no longer needed.
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Mar 21, 2025
A while ago we introduced a completion postings format extension to eventually
be able to customize how completion FSTs are loaded. See elastic#111494.

We have never leveraged this extension, and meanwhile Lucene is moving
to always load FSTs off-heap, and no longer allow on-heap.
See apache/lucene#14364 .

This commit removes the SPI extension as it is no longer needed.
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
A while ago we introduced a completion postings format extension to eventually
be able to customize how completion FSTs are loaded. See elastic#111494.

We have never leveraged this extension, and meanwhile Lucene is moving
to always load FSTs off-heap, and no longer allow on-heap.
See apache/lucene#14364 .

This commit removes the SPI extension as it is no longer needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types serverless-linked Added by automation, don't add manually Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants