Skip to content

Extensible Completion Postings Formats #111494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
a76cb4d
Make completion postings format extensible
JVerwolf Jul 31, 2024
3a317a8
Spotless
JVerwolf Jul 31, 2024
f62a7a7
Spotless
JVerwolf Jul 31, 2024
ee4604d
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Jul 31, 2024
c664739
Update docs/changelog/111494.yaml
JVerwolf Jul 31, 2024
4343eea
Fix docs
JVerwolf Aug 2, 2024
35eaba5
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Aug 2, 2024
a088ae9
Make completion postings format extensible
JVerwolf Jul 31, 2024
322d176
Spotless
JVerwolf Jul 31, 2024
a88ead0
Spotless
JVerwolf Jul 31, 2024
4fcb416
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Aug 9, 2024
c9eed64
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Aug 22, 2024
fc94a32
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Aug 26, 2024
4a80fe3
Allow PostingsFormatExtension to be configured by serverless
JVerwolf Aug 27, 2024
da464b7
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Aug 28, 2024
070d451
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Aug 30, 2024
f243893
Simplify names and logic
JVerwolf Sep 6, 2024
377759c
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Sep 6, 2024
13b926a
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 9, 2024
bbf06c9
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 10, 2024
40fb72f
Update license header
JVerwolf Oct 10, 2024
943047d
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 11, 2024
c91c23d
Add javadoc
JVerwolf Oct 11, 2024
d1f4403
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 15, 2024
ade1546
Improve javadoc
JVerwolf Oct 15, 2024
a80e744
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 15, 2024
86d0f7c
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 16, 2024
56e2c5c
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 21, 2024
8858d39
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 21, 2024
e4659c2
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 21, 2024
f8a0c51
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 22, 2024
616dee8
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 24, 2024
35f66e8
PR Feedback: remove feature
JVerwolf Oct 30, 2024
2f16043
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 30, 2024
70f2143
PR Feedback: remove export and move format
JVerwolf Oct 31, 2024
23186f6
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Oct 31, 2024
ae2a0af
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 4, 2024
2ab57cb
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 5, 2024
add1089
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 7, 2024
3f6a72e
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 8, 2024
a5a60ee
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 12, 2024
7d1d145
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 13, 2024
93db5b2
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 13, 2024
5a90785
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 13, 2024
3abc777
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 14, 2024
3e80ca9
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 18, 2024
c3da22f
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 19, 2024
ba4a91c
Merge branch 'main' of github.com:elastic/elasticsearch into enhancem…
JVerwolf Nov 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/111494.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 111494
summary: Extensible Completion Postings Formats
area: "Suggesters"
type: enhancement
issues: []
7 changes: 6 additions & 1 deletion server/src/main/java/module-info.java
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
* License v3.0 only", or the "Server Side Public License, v 1".
*/

import org.elasticsearch.internal.CompletionsPostingsFormatExtension;
import org.elasticsearch.plugins.internal.RestExtension;

/** The Elasticsearch Server Module. */
Expand Down Expand Up @@ -287,7 +288,9 @@
to
org.elasticsearch.serverless.version,
org.elasticsearch.serverless.buildinfo,
org.elasticsearch.serverless.constants;
org.elasticsearch.serverless.constants,
org.elasticsearch.serverless.codec,
org.elasticsearch.stateless;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you help me understand why we need to add org.elasticsearch.serverless.codec here as well and below, and why we need to add org.elasticsearch.stateless here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, org.elasticsearch.serverless.codec and org.elasticsearch.stateless are separate java modules. Each requires access to the CompletionsPostingsFormatExtension found in the org.elasticsearch.internal java module.

@rjernst mind checking this as well to ensure this is correct? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe org.elasticsearch.stateless should not be needed here, but codec is so that it can implement the internal spi.

Copy link
Contributor Author

@JVerwolf JVerwolf Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rjernst. I'll remove the export to stateless now that we are no longer using the Feature (for which this was previously required).

exports org.elasticsearch.lucene.analysis.miscellaneous;
exports org.elasticsearch.lucene.grouping;
exports org.elasticsearch.lucene.queries;
Expand Down Expand Up @@ -394,6 +397,7 @@
org.elasticsearch.stateless,
org.elasticsearch.settings.secure,
org.elasticsearch.serverless.constants,
org.elasticsearch.serverless.codec,
org.elasticsearch.serverless.apifiltering,
org.elasticsearch.internal.security;

Expand All @@ -413,6 +417,7 @@
uses org.elasticsearch.node.internal.TerminationHandlerProvider;
uses org.elasticsearch.internal.VersionExtension;
uses org.elasticsearch.internal.BuildExtension;
uses CompletionsPostingsFormatExtension;
uses org.elasticsearch.features.FeatureSpecification;
uses org.elasticsearch.plugins.internal.LoggingDataProvider;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import org.elasticsearch.index.codec.bloomfilter.ES87BloomFilterPostingsFormat;
import org.elasticsearch.index.codec.postings.ES812PostingsFormat;
import org.elasticsearch.index.codec.tsdb.ES87TSDBDocValuesFormat;
import org.elasticsearch.index.mapper.CompletionFieldMapper;
import org.elasticsearch.index.mapper.IdFieldMapper;
import org.elasticsearch.index.mapper.Mapper;
import org.elasticsearch.index.mapper.MapperService;
Expand Down Expand Up @@ -53,9 +54,9 @@ public PostingsFormat getPostingsFormatForField(String field) {

private PostingsFormat internalGetPostingsFormatForField(String field) {
if (mapperService != null) {
final PostingsFormat format = mapperService.mappingLookup().getPostingsFormat(field);
if (format != null) {
return format;
Mapper mapper = mapperService.mappingLookup().getMapper(field);
if (mapper instanceof CompletionFieldMapper) {
return CompletionFieldMapper.postingsFormat();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst asked me to clean this up to work similarly to getKnnVectorsFormatForField below. Now, the CompletionFieldMapper is solely responsible for exposing it's own postings format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that this is indeed easier to follow. I have one minor suggestion for improvement: would it make sense to move the loading of the codec name etc. also to this class? Given it's just static code, I wonder if we can have it all in a single place. The main reason why I would do so is that we hardcode Competion912 , and I am anxious that changes may happen in Lucene and we may forget to update this string. It is probably easier to miss if it's directly in PerFieldFormatSupplier. We can also make this change later, it's not a big deal. This is a pre-existing problem and the code looks better with your change already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to following up with moving the contents of CompletionFieldMapper.postingsFormat() into here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I have some confusion as the the purview of each class with respect to who ultimately is the source of truth for the format. To me it seems like the mapper should own it, though I see mixed examples of ownership here. I however agree it's cleaner to just move it here, so I'll do that.

}
}
// return our own posting format using PFOR
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@
import org.elasticsearch.index.analysis.AnalyzerScope;
import org.elasticsearch.index.analysis.NamedAnalyzer;
import org.elasticsearch.index.query.SearchExecutionContext;
import org.elasticsearch.internal.CompletionsPostingsFormatExtension;
import org.elasticsearch.plugins.ExtensionLoader;
import org.elasticsearch.search.suggest.completion.CompletionSuggester;
import org.elasticsearch.search.suggest.completion.context.ContextMapping;
import org.elasticsearch.search.suggest.completion.context.ContextMappings;
Expand All @@ -48,6 +50,7 @@
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.ServiceLoader;
import java.util.Set;

/**
Expand Down Expand Up @@ -344,8 +347,20 @@ public CompletionFieldType fieldType() {
return (CompletionFieldType) super.fieldType();
}

static PostingsFormat postingsFormat() {
return PostingsFormat.forName("Completion912");
public static PostingsFormat postingsFormat() {
return PostingsFormatHolder.POSTINGS_FORMAT;
}

private static class PostingsFormatHolder {
private static final PostingsFormat POSTINGS_FORMAT = getPostingsFormat();

private static PostingsFormat getPostingsFormat() {
String defaultName = "Completion912";
String codecName = ExtensionLoader.loadSingleton(ServiceLoader.load(CompletionsPostingsFormatExtension.class))
.map(CompletionsPostingsFormatExtension::getFormatName)
.orElse(defaultName);
return PostingsFormat.forName(codecName);
}
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@

package org.elasticsearch.index.mapper;

import org.apache.lucene.codecs.PostingsFormat;
import org.elasticsearch.cluster.metadata.DataStream;
import org.elasticsearch.cluster.metadata.InferenceFieldMetadata;
import org.elasticsearch.index.IndexSettings;
Expand All @@ -21,7 +20,6 @@
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
Expand Down Expand Up @@ -58,7 +56,6 @@ private CacheKey() {}
private final Map<String, NamedAnalyzer> indexAnalyzersMap;
private final List<FieldMapper> indexTimeScriptMappers;
private final Mapping mapping;
private final Set<String> completionFields;
private final int totalFieldsCount;

/**
Expand Down Expand Up @@ -161,7 +158,6 @@ private MappingLookup(
this.nestedLookup = NestedLookup.build(nestedMappers);

final Map<String, NamedAnalyzer> indexAnalyzersMap = new HashMap<>();
final Set<String> completionFields = new HashSet<>();
final List<FieldMapper> indexTimeScriptMappers = new ArrayList<>();
for (FieldMapper mapper : mappers) {
if (objects.containsKey(mapper.fullPath())) {
Expand All @@ -174,9 +170,6 @@ private MappingLookup(
if (mapper.hasScript()) {
indexTimeScriptMappers.add(mapper);
}
if (mapper instanceof CompletionFieldMapper) {
completionFields.add(mapper.fullPath());
}
}

for (FieldAliasMapper aliasMapper : aliasMappers) {
Expand Down Expand Up @@ -211,7 +204,6 @@ private MappingLookup(
this.objectMappers = Map.copyOf(objects);
this.runtimeFieldMappersCount = runtimeFields.size();
this.indexAnalyzersMap = Map.copyOf(indexAnalyzersMap);
this.completionFields = Set.copyOf(completionFields);
this.indexTimeScriptMappers = List.copyOf(indexTimeScriptMappers);

runtimeFields.stream().flatMap(RuntimeField::asMappedFieldTypes).map(MappedFieldType::name).forEach(this::validateDoesNotShadow);
Expand Down Expand Up @@ -285,15 +277,6 @@ public Iterable<Mapper> fieldMappers() {
return fieldMappers.values();
}

/**
* Gets the postings format for a particular field
* @param field the field to retrieve a postings format for
* @return the postings format for the field, or {@code null} if the default format should be used
*/
public PostingsFormat getPostingsFormat(String field) {
return completionFields.contains(field) ? CompletionFieldMapper.postingsFormat() : null;
}

void checkLimits(IndexSettings settings) {
checkFieldLimit(settings.getMappingTotalFieldsLimit());
checkObjectDepthLimit(settings.getMappingDepthLimit());
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the "Elastic License
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
* Public License v 1"; you may not use this file except in compliance with, at
* your election, the "Elastic License 2.0", the "GNU Affero General Public
* License v3.0 only", or the "Server Side Public License, v 1".
*/

package org.elasticsearch.internal;

import org.apache.lucene.search.suggest.document.CompletionPostingsFormat;

/**
* Allows plugging-in the Completions Postings Format.
*/
public interface CompletionsPostingsFormatExtension {

/**
* Returns the name of the {@link CompletionPostingsFormat} that Elasticsearch should use. Should return null if the extension
* is not enabled.
*/
String getFormatName();

/**
* Sets whether this extension is enabled. If the extension is not enabled, {@link #getFormatName()} should return null.
* <p>
* This allows all nodes to be upgraded to a version that supports the extension before it is enabled.
*/
void setExtensionEnabled(boolean isExtensionEnabled);
}