Skip to content

Fix Rollup job creation to work with templates #43943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 6, 2019

Conversation

polyfractal
Copy link
Contributor

The PutJob API accidentally used an "expert" API of CreateIndexRequest. That API is semi-lenient to syntax; a type could be omitted and the request would work as expected. But if a type was omitted it would not merge with templates correctly, leading to index creation that only has the template and not the requested mappings in the request. This semi-lenient behavior is why it went unnoticed until now (no tests that also use unrelate templates)

This commit refactors the PutJob API to:

  • Include the type name
  • Use a less "expert" API in an attempt to future proof against errors
  • Uses maps instead of string parsing, which is easier to deal with
    and more robust

The PutJob API accidentally used an "expert" API of CreateIndexRequest.
That API is semi-lenient to syntax; a type could be omitted and the
request would work as expected.  But if a type was omitted it would
not merge with templates correctly, leading to index creation that
only has the template and not the requested mappings in the request.

This commit refactors the PutJob API to:

- Include the type name
- Use a less "expert" API in an attempt to future proof against errors
- Uses maps instead of string parsing, which is easier to deal with
and more robust
@polyfractal polyfractal added >bug :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v8.0.0 v7.4.0 labels Jul 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@polyfractal
Copy link
Contributor Author

@elasticmachine update branch

@polyfractal
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2

@polyfractal polyfractal requested a review from jimczi July 25, 2019 19:42

String mapping = Rollup.DYNAMIC_MAPPING_TEMPLATE
.replace(Rollup.MAPPING_METADATA_PLACEHOLDER, jobMetadata);
((Map<String, Object>)((Map<String, Object>) ((Map<String, Object>) ((Map<String, Object>) mapping.get("mappings"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is...uhh... not ideal, but seemed less hacky than doing a string replace. I could be convinced otherwise, open to alternatives.

Copy link
Contributor

@jimczi jimczi Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use an XContentBuilder directly to simplify the understanding of this code. Something like:

private static XContentBuilder createMappings(RollupJobConfig config) throws IOException {
        return XContentBuilder.builder(XContentType.JSON.xContent())
            .startObject()
                .startObject("mappings")
                    .startObject("_doc")
                        .startObject("_meta")
                            .field("rollup-version", Version.CURRENT)
                            .startObject("_rollup")
                                .field(config.getId(), config)
                            .endObject()
                        .endObject()
                        .startArray("dynamic_templates")
                            .startObject()
                                .startObject("strings")
                                    .field("match_mapping_type", "string")
                                    .startObject("mapping")
                                        .field("type", "keyword")
                                    .endObject()
                                .endObject()
                            .endObject()
                            .startObject()
                                .startObject("date_histograms")
                                    .field("path_match", "*.date_histogram.timestamp")
                                    .startObject("mapping")
                                        .field("type", "date")
                                    .endObject()
                                .endObject()
                            .endObject()
                        .endArray()
                    .endObject()
                .endObject()
            .endObject();
    }

... seems easier to read imo. We could also retrieve the dynamic_templates from a file like we do today but we have only 2 rules so I am not too concerned by the inlining here. What do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a hundred times better :) Will make the change

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review @polyfractal . I left one comment.


String mapping = Rollup.DYNAMIC_MAPPING_TEMPLATE
.replace(Rollup.MAPPING_METADATA_PLACEHOLDER, jobMetadata);
((Map<String, Object>)((Map<String, Object>) ((Map<String, Object>) ((Map<String, Object>) mapping.get("mappings"))
Copy link
Contributor

@jimczi jimczi Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use an XContentBuilder directly to simplify the understanding of this code. Something like:

private static XContentBuilder createMappings(RollupJobConfig config) throws IOException {
        return XContentBuilder.builder(XContentType.JSON.xContent())
            .startObject()
                .startObject("mappings")
                    .startObject("_doc")
                        .startObject("_meta")
                            .field("rollup-version", Version.CURRENT)
                            .startObject("_rollup")
                                .field(config.getId(), config)
                            .endObject()
                        .endObject()
                        .startArray("dynamic_templates")
                            .startObject()
                                .startObject("strings")
                                    .field("match_mapping_type", "string")
                                    .startObject("mapping")
                                        .field("type", "keyword")
                                    .endObject()
                                .endObject()
                            .endObject()
                            .startObject()
                                .startObject("date_histograms")
                                    .field("path_match", "*.date_histogram.timestamp")
                                    .startObject("mapping")
                                        .field("type", "date")
                                    .endObject()
                                .endObject()
                            .endObject()
                        .endArray()
                    .endObject()
                .endObject()
            .endObject();
    }

... seems easier to read imo. We could also retrieve the dynamic_templates from a file like we do today but we have only 2 rules so I am not too concerned by the inlining here. What do you think ?

@polyfractal
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@polyfractal polyfractal merged commit ed3da66 into elastic:master Aug 6, 2019
polyfractal added a commit that referenced this pull request Aug 6, 2019
The PutJob API accidentally used an "expert" API of CreateIndexRequest.
That API is semi-lenient to syntax; a type could be omitted and the
request would work as expected.  But if a type was omitted it would
not merge with templates correctly, leading to index creation that
only has the template and not the requested mappings in the request.

This commit refactors the PutJob API to:

- Include the type name
- Use a less "expert" API in an attempt to future proof against errors
- Uses an XContentBuilder instead of string replacing, removes json template
@polyfractal
Copy link
Contributor Author

It's come to my attention that the impact of this bug is wider than I thought. My original thinking was that only index patterns targeting Rollups were problematic, and should be relatively rare. But since this is affected by any template, common templates like * (used to set shard counts across the board, etc) will also cause problems due to this bug.

We might want to look into backporting this further. I'll take a look and raise this for discussion with the team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data team-discuss v7.4.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants