-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Text fields are stored by default in TSDB indices #106338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Synthetic source requires text fields to be stored or have keyword sub-field that supports synthetic source. If there are no keyword fields users currently have to explicitly set 'store' to 'true' or get a validation exception. This is not the best experience. It is quite likely that setting `store` to `true` is the correct thing to do but users still get an error and need to investigate it. With this change if `store` setting is not specified in such context it will be set to `true` by default. Setting it explicitly to `false` results in the exception. Closes elastic#97039
Documentation preview: |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Hi @lkts, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I left a comment about a potential different direction we may want to take here.
// | ||
// If 'store' parameter was explicitly provided we'll let it pass and | ||
// fail in TextFieldMapper#syntheticFieldLoader later if needed. | ||
if (store.isSet() == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would do default values and validation via the Parameter
class in the Builder
constructor in the file. Something like this:
this.store = Parameter.storeParam(m -> ((TextFieldMapper) m).store, () -> syntheticSource).addValidator(value -> {
if (syntheticSource && value == false) {
throw new IllegalArgumentException("If synthetic source is enabled, then store cannot be set to false");
}
});
The issue is that now we can't see the details we care about here in the multi fields (MultiFields.Builder
only keep a map with a Function
per field name around).
Note that this way does require more plumbing and additional changes.
If we like to do this approach we may need to make changes to MultiFields.Builder
, so that we know whether there is a keyword field (either stored or doesn't have normalizer and ignore above) as multi field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this approach and there is a problem. It is not possible to have syntheticSource
flag with proposed impl from gist in MappingParserContext
because it will return if synthetic source is enabled for currently existing mapping or empty mapping in case of index creation. mappingLookup
will only be updated as a last step after parsing and building new mapping.
Value that we need is here https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/mapper/MappingParser.java#L147 and it is not available to field mapper builders since builders are created in RootObjectMapper#parse
which happens before that.
We could try:
- Creating metadata field mappers before
RootObjectMapper
. I assume that there is a good reason for that ordering so this is unlikely a solution. - Support storing text fields by default only for TSDB indices (and any future index modes that will have synthethic source always enabled) instead of for any index with synthetic source enabled. This works since we know source is synthetic even without parsing the mapping. I think we are mostly interested in this use case anyway so this may be a good solution.
- Keep initial implementation from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing down this analysis. It is kind of a pain that we know synthetic source is used after the builders are created. I don't think it is worth changing the construction of mapping builders, since it most likely is a tricky change.
Regarding option 2, we know if index mode has been set to time series that synthetic source is enabled, and this will also be the case for the new logging index mode. Given that index mode comes from index setting, we know at the time mapping builders are created what the index mode has been set to. So we can use that as well for determining the default value of the store
attribute. So let's do option 2?
server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java
Show resolved
Hide resolved
Pinging @elastic/es-search (Team:Search) |
I have pushed change to use option 2 from the discussion above. One issue i discovered is that it is not possible to implement validation using As a result i removed validation logic and i am relying on existing code for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue i discovered is that it is not possible to implement validation using Parameter#addValidator API. The problem is that we need to provide full name of the field in the error message e.g. k8s.pod.agent.name (as we currently do). It is not possible to get this information in constructor of TextFieldMapper.Builder, this information is only exposed during build() via MapperBuilderContext#buildFullName. We can only report field name as name in this case which is obviously confusing.
I agree this part is confusion, so +1 to rely on the currently validation if store is specifically disabled.
This change is looking good, a left a few more (minor) comments.
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false` | ||
(default). | ||
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false` (default). | ||
To support <<synthetic-source,synthetic `_source`>> this parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename To support <<synthetic-source,synthetic
_source>> this parameter
to If <<synthetic-source,synthetic
_source>> is enabled, then this parameter
?
@@ -283,12 +283,23 @@ public static class Builder extends FieldMapper.Builder { | |||
|
|||
final TextParams.Analyzers analyzers; | |||
|
|||
public Builder(String name, IndexAnalyzers indexAnalyzers) { | |||
this(name, IndexVersion.current(), indexAnalyzers); | |||
public Builder(String name, IndexAnalyzers indexAnalyzers, boolean isSyntheticSourceAlwaysUsed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to synthetcSourceEnabledViaIndexMode
? To indicate that synthetic source is enabled because the index mode enables it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to express that and your suggestion is so much more clear!
@@ -1258,7 +1265,7 @@ public Map<String, NamedAnalyzer> indexAnalyzers() { | |||
|
|||
@Override | |||
public FieldMapper.Builder getMergeBuilder() { | |||
return new Builder(simpleName(), indexCreatedVersion, indexAnalyzers).init(this); | |||
return new Builder(simpleName(), indexCreatedVersion, indexAnalyzers, false).init(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always providing false
here is incorrect. It could disable synthetic source at some point. I think we need record in TextFieldMapper
class, that synthetic source was enabled, so the right value can be provided here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can actually disable it since the default value should be already set correctly before that and init
will pick it up. I do agree it is more clear to store it though.
server/src/main/java/org/elasticsearch/index/query/QueryRewriteContext.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/fielddata/AbstractFieldDataTestCase.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/fielddata/FilterFieldDataTests.java
Outdated
Show resolved
Hide resolved
CI uncovered another interesting problem. elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java Line 1423 in 2b67444
includeDefaults seems to be always false . During downsampling new index is created which is not a TSDB index and since store value was not serialized it defaults to false . We successfully transfer synthetic source setting from original index and as a result we now have synthetic source and text field that is not stored.
I have fixed serialization but i honestly don't know if it is a proper solution. I don't really like the state of "defaults can be interpreted differently in different contexts". |
// we want to serialize 'true' value even if it is default | ||
// since defaults differ for different index modes | ||
// and can be interpreted differently is different context | ||
.setSerializerCheck((includeDefaults, isConfigured, value) -> includeDefaults || isConfigured || value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A downsampled index should have index.mode=time_series. What exact failure occurs if this check is missing? Maybe we need to change something in TransportDownsampleAction
instead.
Downsample index is created using temporary index service using a static predefined subset of index settings. All other settings are later copied over from source index. As discovered in elastic#106338 this causes context like index mode to be missing during initial index creation process. This PR adds index mode and related required settings to initial set of index settings in order to have access to this information during initial create index operation.
* Set index mode earlier for new downsample index Downsample index is created using temporary index service using a static predefined subset of index settings. All other settings are later copied over from source index. As discovered in #106338 this causes context like index mode to be missing during initial index creation process. This PR adds index mode and related required settings to initial set of index settings in order to have access to this information during initial create index operation.
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two comments. LGTM otherwise.
@@ -249,6 +251,61 @@ public void testDefaults() throws IOException { | |||
assertEquals(DocValuesType.NONE, fieldType.docValuesType()); | |||
} | |||
|
|||
public void testStoreParameterDefaults() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test needs to be adjusted with the serialization logic removed to address the CI failure?
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false` | ||
(default). | ||
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false` (default). | ||
If <<synthetic-source,synthetic _source>> is enabled, then this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove If <<synthetic-source,synthetic _source>> is enabled
part? The store
param defaults to true
for tsdb indices, when there is no suitable multi field of type keyword .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, thanks. This is leftover.
Synthetic source requires text fields to be stored or have keyword
sub-field that supports synthetic source. If there are no keyword fields
users currently have to explicitly set 'store' to 'true' or get a
validation exception. This is not the best experience. It is quite
likely that setting
store
totrue
is the correct thing to do butusers still get an error and need to investigate it. With this change if
store
setting is not specified in such context it will be set totrue
by default. Setting it explicitly tofalse
results in theexception.
Closes #97039