Rollover add max_primary_shard_docs condition #80981

weizijun · 2021-11-24T03:51:34Z

Rollover now support four condition:

max_age
max_docs
max_size
max_primary_shard_size

Add a new condition named max_primary_shard_docs .

Triggers rollover when the largest shard in the index reaches a certain number of documents.
This is the maximum docs of the shards in the index. As with max_docs.

dakrone · 2021-11-29T23:27:29Z

@weizijun can you explain why you think this addition would be useful versus the regular max_docs condition? Are you concerned about unevenly weighted shards using custom routing during indexing?

weizijun · 2021-11-30T06:52:36Z

@weizijun can you explain why you think this addition would be useful versus the regular max_docs condition? Are you concerned about unevenly weighted shards using custom routing during indexing?

@dakrone I think there are the lists benefit:

Reduce the max docs in one shards, if a shards has so many docs, rollup\force_merge\search latency is affected.
just as you see, to avoid the case about unevenly weighted shards.
Don't care the number_of_shards, just care the max_docs in one shard. the policy can use in many kind of indices.

dakrone · 2021-11-30T15:58:29Z

Don't care the number_of_shards, just care the max_docs in one shard. the policy can use in many kind of indices.

This is one of the reasons we introduced max_primary_shard_size, so that it scaled regardless of the number of shards in the index. I would say that for 90%+ of the policies that we see, size-based rollover makes much more sense than document-count rollover, since the size of documents varies greatly depending on what type of data is indexed. Perhaps you have some experience you can share about unevenly weighted shards you can share? Is there a situation max_shard_docs would solve that max_primary_shard_size doesn't work?

weizijun · 2021-11-30T16:28:13Z

@dakrone the shard_size is different from the document fields count and the field's type, but the search performance is depend on the document counts.

In time_series case, the shard is small, but the counts is large, in our case about 40 million documents, the shard size is about 4gb, after our improvement with compressing the docvalues, the shard size is more smaller.

so we want to set the document count limit for the index. and we want to set a default policy for all indices, to limit the index's max_shard_docs

weizijun · 2021-12-08T01:56:51Z

@dakrone is the feature necessary to develop? if it's ok, I will continue to develop the feature. I prefer the feature, as the shard doc count can also affect query performance

dakrone · 2021-12-08T18:56:37Z

@weizijun I'll mark this for discussion, I don't think I have a good use case for it, but there may be something else I'm missing.

weizijun · 2021-12-09T02:07:12Z

@weizijun I'll mark this for discussion, I don't think I have a good use case for it, but there may be something else I'm missing.

ok, thanks, when you have conclusion, I will continue to develop.

weizijun · 2022-01-13T03:50:13Z

hi, @dakrone @martijnvg , how is the discussion going on? is it necessary continue to develop？

martijnvg

This new condition makes sense, when dealing with data sources that have many documents that are small in bytes on disk (which is the case for time series data sources). I left two comments as an initial review.

martijnvg · 2022-01-24T10:02:27Z

client/rest-high-level/src/main/java/org/elasticsearch/client/ilm/RolloverAction.java

@@ -26,11 +26,12 @@
    private static final ParseField MAX_PRIMARY_SHARD_SIZE_FIELD = new ParseField("max_primary_shard_size");
    private static final ParseField MAX_AGE_FIELD = new ParseField("max_age");
    private static final ParseField MAX_DOCS_FIELD = new ParseField("max_docs");
+    private static final ParseField MAX_SHARD_DOCS_FIELD = new ParseField("max_shard_docs");


I don't think it is needed to add this new limit to the HLRC code base.
This HLRC is no longer developed in favour for the new java client.
This just exists in master/8.0 for tests that still rely on hlrc,
but there should be no need to add new things to the HLRC.
The HLRC is no longer released / published and will be removed in the near future.
So I think we can undo all changes in the client directory.

got it, I will revert them.

martijnvg · 2022-01-24T10:05:18Z

docs/reference/ilm/actions/ilm-rollover.asciidoc

@@ -81,6 +81,14 @@ replicas are ignored.
 TIP: To see the current shard size, use the <<cat-shards, _cat shards>> API.
 The `store` value shows the size each shard, and `prirep` indicates whether a
 shard is a primary (`p`) or a replica (`r`).
+
+`max_shard_docs`::


Maybe a better name would be max_single_shard_docs?
Since this condition will apply if any shard reach this limit?

Since this condition will apply if any shard reach this limit?
yeah, any shard that reaches the limit will cause a rollover.

I'm not sure, which one of the following is better:

max_shard_docs

max_single_shard_docs

max_primary_shard_docs

We can discuss and make a decision.

Maybe max_primary_shard_docs is better here? Given how the doc count is computed in TransportRolloverAction#buildStats(...). In this code non primary shards are filtered, just like this
code is already doing when computing maxPrimaryShardSize.

yeah, I also prefer max_primary_shard_docs, I will change the name to max_primary_shard_docs

weizijun · 2022-01-24T10:46:01Z

This new condition makes sense, when dealing with data sources that have many documents that are small in bytes on disk (which is the case for time series data sources)

thank you, I will continue to complete the code.

* upstream/master: (762 commits) [DOCS] Add note to that log4j customization is outside the support scope (elastic#82668) Batch Index Settings Update Requests (elastic#82896) [DOCS] Delete pipeline containing stored script (elastic#83102) Try again to fix changelog areas after reorg (elastic#83100) Bind to non-localhost for transport in some cases (elastic#82973) [DOCS] Reuse multi-level `join` warning (elastic#82976) Remove unnecessary CopyOnWriteHashMap class (elastic#83040) Adjust changelog categories after reorg (elastic#83087) [DOCS] Fix typo in `action.destructive_requires_name` breaking change (elastic#83085) Stack Monitoring: Add Enterprise Search monitoring index templates (elastic#82743) [DOCS] Fix stored script example snippet (elastic#83056) [DOCS] Re-add network traffic para to `term` query (elastic#83047) [DOCS] Rename example stored script (elastic#83054) [ML][DOCS] Add Trained model APIs to the REST APIs index (elastic#82791) [ML] Update running process when global calendar changes (elastic#83044) [Transform] Fix condition on which the transform stops processing buckets (elastic#82852) [DOCS] Fixes field names in ML sum functions. (elastic#83048) [ML] fix NLP tokenization never_split handling around punctuation (elastic#82982) Construct dynamic updates directly via object builders (elastic#81449) Emit trace.id into audit logs (elastic#82849) ... # Conflicts: # client/rest-high-level/src/test/java/org/elasticsearch/client/IndicesClientIT.java # client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/ILMDocumentationIT.java # server/src/main/java/org/elasticsearch/action/admin/indices/rollover/Condition.java # server/src/test/java/org/elasticsearch/action/admin/indices/rollover/ConditionTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/RolloverActionTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/TimeseriesLifecycleTypeTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/WaitForRolloverReadyStepTests.java

martijnvg · 2022-02-03T08:54:37Z

@elasticmachine test this please

martijnvg

This looks good @weizijun, I left a small comment.

...rc/yamlRestTest/resources/rest-api-spec/test/indices.rollover/25_max_shard_doc_condition.yml

martijnvg · 2022-02-03T08:58:47Z

The resize API have a max_primary_shard_size parameter, do I also need to add the max_primary_shard_docs parameter to the resize API? Now I don't add the max_primary_shard_docs. If it needed to add the parameter to the resize API , I will continue to complete the code.

I don't this is needed for now. Maybe when needed, this can be added via a different PR.
Also apologies for the delay in responding to this PR.

weizijun · 2022-02-03T11:59:17Z

I don't this is needed for now. Maybe when needed, this can be added via a different PR.
Also apologies for the delay in responding to this PR.

It's ok, I guess you are busy doing data stream with TSDB, and this week is our Chinese Spring Festival vacation. I will update the code and find the reason of failed tests.

martijnvg · 2022-02-03T12:01:35Z

and this week is our Chinese Spring Festival vacation.

No need to update this PR during your vacation. Enjoy your time off 👍.

weizijun · 2022-02-03T12:09:17Z

No need to update this PR during your vacation. Enjoy your time off 👍.

Thanks, I will fixed it next week.

weizijun · 2022-02-07T02:42:49Z

@elasticmachine update branch

martijnvg · 2022-02-07T08:38:52Z

@elasticmachine test this please

weizijun · 2022-02-07T14:12:03Z

@martijnvg All checks have passed, and I will resolve the conflicting file.

* upstream/master: [DOCS] Switch xrefs to external links (elastic#83590) [DOCS] 'features' flag added in elastic#83083 (elastic#83452) Rename ChangePolicyforIndexIT to ChangePolicyForIndexIT (elastic#83569) Fixing random_sampler tests (elastic#83549) Upgrade Checkstyle to 9.3 (elastic#83314) Make improvements to the release notes generator (elastic#83525) Cleanup DataTierAllocationDecider (elastic#83572) Upgrade jANSI dependency to 2.4.0 (elastic#83566) Speed up Name Collision Check in Metadata.Builder (elastic#83340) SQL: Add range checks to interval multiplication operation (elastic#83478) Remove DiscoveryNodes#getAllNodes (elastic#83538) Make RoutingNodes behave like a collection (elastic#83540) Remove Unused CS Listener from SecurityServerTransportInterceptor (elastic#83556)

martijnvg

I left a small docs comment, otherwise this LGTM 👍.

docs/reference/ilm/actions/ilm-rollover.asciidoc

martijnvg · 2022-02-08T09:31:53Z

@elasticmachine update branch

martijnvg · 2022-02-08T09:32:19Z

@elasticmachine test this please

martijnvg · 2022-02-08T13:04:23Z

Thanks for contributing this change @weizijun!

weizijun · 2022-02-08T13:36:31Z

Thanks for contributing this change @weizijun!

Thanks @martijnvg for your review!

* upstream/master: (166 commits) Bind host all instead of just _site_ when needed (elastic#83145) [DOCS] Fix min/max agg snippets for histograms (elastic#83695) [DOCS] Add deprecation notice for system indices (elastic#83688) Cache ILM policy name on IndexMetadata (elastic#83603) [DOCS] Fix 8.0 breaking changes sort order (elastic#83685) [ML] fix random sampling background query consistency (elastic#83676) Move internal APIs into their own namespace '_internal' Runtime fields core-with-mapped tests support tsdb (elastic#83577) Optimize calculating the presence of a quorum (elastic#83638) Use switch expressions in EnableAllocationDecider and NodeShutdownAllocationDecider (elastic#83641) Note libffi error message in tmpdir docs (elastic#83662) Fix TransportDesiredNodesActionsIT batch tests (elastic#83659) [DOCS] Remove unused upgrade doc files (elastic#83617) [ML] Wait for model process to stop in stop deployment (elastic#83644) [ML] Fix submit after shutdown in process worker service (elastic#83645) Remove req/resp classes associated with HLRC (elastic#83599) Introduce index.version.compatibility setting (elastic#83264) Rename InternalTestCluster#getMasterNodeInstance (elastic#83407) Mute TimeSeriesIndexSearcherTests testCollectInOrderAcrossSegments (elastic#83648) Add rollover add max_primary_shard_docs condition (elastic#80981) ... # Conflicts: # x-pack/plugin/rollup/build.gradle # x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/v2/RollupActionSingleNodeTests.java

weizijun added 4 commits November 22, 2021 20:17

rollover add max_shard_docs condition

d1c4ed7

fixup

3cea0d7

rollover add max_shard_docs condition

ed8fc6e

rollover add max_shard_docs condition

e8ad38f

elasticsearchmachine added v8.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 24, 2021

weizijun added 2 commits November 24, 2021 12:01

fixup

7e7b33b

spotless

2939d24

jtibshirani added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label Dec 1, 2021

dakrone added the team-discuss label Dec 8, 2021

martijnvg removed the team-discuss label Jan 24, 2022

martijnvg reviewed Jan 24, 2022

View reviewed changes

weizijun force-pushed the rollover-add-max_shard_docs branch from 06af3ef to 2939d24 Compare January 26, 2022 06:20

weizijun changed the title ~~Rollover add max_shard_docs condition~~ Rollover add max_primary_shard_docs condition Jan 26, 2022

weizijun added 6 commits January 26, 2022 16:07

fix exception

24cc5cd

rename

d86f1f8

rename

aca6b68

remove file

e60b76e

fix

d77fd75

improve

e9282d0

mark-vieira added the v8.2.0 label Feb 2, 2022

martijnvg added the >enhancement label Feb 3, 2022

martijnvg self-assigned this Feb 3, 2022

martijnvg reviewed Feb 3, 2022

View reviewed changes

...rc/yamlRestTest/resources/rest-api-spec/test/indices.rollover/25_max_shard_doc_condition.yml Outdated Show resolved Hide resolved

spotless

af982a3

elasticmachine and others added 3 commits February 6, 2022 20:42

Merge branch 'master' into rollover-add-max_shard_docs

037621a

change version from 8.1.0 to 8.2.0

8396f2b

add change log

69ee711

martijnvg approved these changes Feb 8, 2022

View reviewed changes

docs/reference/ilm/actions/ilm-rollover.asciidoc Outdated Show resolved Hide resolved

comment

764ce75

Merge branch 'master' into rollover-add-max_shard_docs

7facb5a

martijnvg merged commit 79132ed into elastic:master Feb 8, 2022

weizijun deleted the rollover-add-max_shard_docs branch April 28, 2022 08:47

dakrone mentioned this pull request Jul 14, 2022

Add support for max_primary_shard_docs rollover criteria to ILM UI elastic/kibana#135088

Closed

Rollover add max_primary_shard_docs condition #80981

Rollover add max_primary_shard_docs condition #80981

Uh oh!

Conversation

weizijun commented Nov 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dakrone commented Nov 29, 2021

Uh oh!

weizijun commented Nov 30, 2021

Uh oh!

dakrone commented Nov 30, 2021

Uh oh!

weizijun commented Nov 30, 2021

Uh oh!

weizijun commented Dec 8, 2021

Uh oh!

dakrone commented Dec 8, 2021

Uh oh!

weizijun commented Dec 9, 2021

Uh oh!

weizijun commented Jan 13, 2022

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

martijnvg Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

weizijun Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

martijnvg Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

weizijun Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

martijnvg Jan 25, 2022

Choose a reason for hiding this comment

Uh oh!

weizijun Jan 25, 2022

Choose a reason for hiding this comment

Uh oh!

weizijun commented Jan 24, 2022

Uh oh!

martijnvg commented Feb 3, 2022

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

martijnvg commented Feb 3, 2022

Uh oh!

weizijun commented Feb 3, 2022

Uh oh!

martijnvg commented Feb 3, 2022

Uh oh!

weizijun commented Feb 3, 2022

Uh oh!

weizijun commented Feb 7, 2022

Uh oh!

martijnvg commented Feb 7, 2022

Uh oh!

weizijun commented Feb 7, 2022

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

martijnvg commented Feb 8, 2022

Uh oh!

martijnvg commented Feb 8, 2022

Uh oh!

martijnvg commented Feb 8, 2022

Uh oh!

weizijun commented Feb 8, 2022

Uh oh!

Uh oh!

weizijun commented Nov 24, 2021 •

edited

Loading