Skip to content

Rollover add max_primary_shard_docs condition #80981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Feb 8, 2022

Conversation

weizijun
Copy link
Contributor

@weizijun weizijun commented Nov 24, 2021

Rollover now support four condition:

  • max_age
  • max_docs
  • max_size
  • max_primary_shard_size

Add a new condition named max_primary_shard_docs .

Triggers rollover when the largest shard in the index reaches a certain number of documents.
This is the maximum docs of the shards in the index. As with max_docs.

@elasticsearchmachine elasticsearchmachine added v8.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 24, 2021
@dakrone
Copy link
Member

dakrone commented Nov 29, 2021

@weizijun can you explain why you think this addition would be useful versus the regular max_docs condition? Are you concerned about unevenly weighted shards using custom routing during indexing?

@weizijun
Copy link
Contributor Author

@weizijun can you explain why you think this addition would be useful versus the regular max_docs condition? Are you concerned about unevenly weighted shards using custom routing during indexing?

@dakrone I think there are the lists benefit:

  1. Reduce the max docs in one shards, if a shards has so many docs, rollup\force_merge\search latency is affected.
  2. just as you see, to avoid the case about unevenly weighted shards.
  3. Don't care the number_of_shards, just care the max_docs in one shard. the policy can use in many kind of indices.

@dakrone
Copy link
Member

dakrone commented Nov 30, 2021

Don't care the number_of_shards, just care the max_docs in one shard. the policy can use in many kind of indices.

This is one of the reasons we introduced max_primary_shard_size, so that it scaled regardless of the number of shards in the index. I would say that for 90%+ of the policies that we see, size-based rollover makes much more sense than document-count rollover, since the size of documents varies greatly depending on what type of data is indexed. Perhaps you have some experience you can share about unevenly weighted shards you can share? Is there a situation max_shard_docs would solve that max_primary_shard_size doesn't work?

@weizijun
Copy link
Contributor Author

@dakrone the shard_size is different from the document fields count and the field's type, but the search performance is depend on the document counts.

In time_series case, the shard is small, but the counts is large, in our case about 40 million documents, the shard size is about 4gb, after our improvement with compressing the docvalues, the shard size is more smaller.

so we want to set the document count limit for the index. and we want to set a default policy for all indices, to limit the index's max_shard_docs

@jtibshirani jtibshirani added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label Dec 1, 2021
@weizijun
Copy link
Contributor Author

weizijun commented Dec 8, 2021

@dakrone is the feature necessary to develop? if it's ok, I will continue to develop the feature. I prefer the feature, as the shard doc count can also affect query performance

@dakrone
Copy link
Member

dakrone commented Dec 8, 2021

@weizijun I'll mark this for discussion, I don't think I have a good use case for it, but there may be something else I'm missing.

@weizijun
Copy link
Contributor Author

weizijun commented Dec 9, 2021

@weizijun I'll mark this for discussion, I don't think I have a good use case for it, but there may be something else I'm missing.

ok, thanks, when you have conclusion, I will continue to develop.

@weizijun
Copy link
Contributor Author

hi, @dakrone @martijnvg , how is the discussion going on? is it necessary continue to develop?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new condition makes sense, when dealing with data sources that have many documents that are small in bytes on disk (which is the case for time series data sources). I left two comments as an initial review.

@@ -26,11 +26,12 @@
private static final ParseField MAX_PRIMARY_SHARD_SIZE_FIELD = new ParseField("max_primary_shard_size");
private static final ParseField MAX_AGE_FIELD = new ParseField("max_age");
private static final ParseField MAX_DOCS_FIELD = new ParseField("max_docs");
private static final ParseField MAX_SHARD_DOCS_FIELD = new ParseField("max_shard_docs");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is needed to add this new limit to the HLRC code base.
This HLRC is no longer developed in favour for the new java client.
This just exists in master/8.0 for tests that still rely on hlrc,
but there should be no need to add new things to the HLRC.
The HLRC is no longer released / published and will be removed in the near future.
So I think we can undo all changes in the client directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, I will revert them.

@@ -81,6 +81,14 @@ replicas are ignored.
TIP: To see the current shard size, use the <<cat-shards, _cat shards>> API.
The `store` value shows the size each shard, and `prirep` indicates whether a
shard is a primary (`p`) or a replica (`r`).

`max_shard_docs`::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better name would be max_single_shard_docs?
Since this condition will apply if any shard reach this limit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this condition will apply if any shard reach this limit?
yeah, any shard that reaches the limit will cause a rollover.

I'm not sure, which one of the following is better:

  • max_shard_docs
  • max_single_shard_docs
  • max_primary_shard_docs

We can discuss and make a decision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe max_primary_shard_docs is better here? Given how the doc count is computed in TransportRolloverAction#buildStats(...). In this code non primary shards are filtered, just like this
code is already doing when computing maxPrimaryShardSize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I also prefer max_primary_shard_docs, I will change the name to max_primary_shard_docs

@weizijun
Copy link
Contributor Author

This new condition makes sense, when dealing with data sources that have many documents that are small in bytes on disk (which is the case for time series data sources)

thank you, I will continue to complete the code.

@weizijun weizijun force-pushed the rollover-add-max_shard_docs branch from 06af3ef to 2939d24 Compare January 26, 2022 06:20
* upstream/master: (762 commits)
  [DOCS] Add note to that log4j customization is outside the support scope (elastic#82668)
  Batch Index Settings Update Requests (elastic#82896)
  [DOCS] Delete pipeline containing stored script (elastic#83102)
  Try again to fix changelog areas after reorg (elastic#83100)
  Bind to non-localhost for transport in some cases (elastic#82973)
  [DOCS] Reuse multi-level `join` warning (elastic#82976)
  Remove unnecessary CopyOnWriteHashMap class (elastic#83040)
  Adjust changelog categories after reorg (elastic#83087)
  [DOCS] Fix typo in `action.destructive_requires_name` breaking change (elastic#83085)
  Stack Monitoring: Add Enterprise Search monitoring index templates (elastic#82743)
  [DOCS] Fix stored script example snippet (elastic#83056)
  [DOCS] Re-add network traffic para to `term` query (elastic#83047)
  [DOCS] Rename example stored script (elastic#83054)
  [ML][DOCS] Add Trained model APIs to the REST APIs index (elastic#82791)
  [ML] Update running process when global calendar changes (elastic#83044)
  [Transform] Fix condition on which the transform stops processing buckets (elastic#82852)
  [DOCS] Fixes field names in ML sum functions. (elastic#83048)
  [ML] fix NLP tokenization never_split handling around punctuation (elastic#82982)
  Construct dynamic updates directly via object builders (elastic#81449)
  Emit trace.id into audit logs (elastic#82849)
  ...

# Conflicts:
#	client/rest-high-level/src/test/java/org/elasticsearch/client/IndicesClientIT.java
#	client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/ILMDocumentationIT.java
#	server/src/main/java/org/elasticsearch/action/admin/indices/rollover/Condition.java
#	server/src/test/java/org/elasticsearch/action/admin/indices/rollover/ConditionTests.java
#	x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/RolloverActionTests.java
#	x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/TimeseriesLifecycleTypeTests.java
#	x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/WaitForRolloverReadyStepTests.java
@weizijun weizijun changed the title Rollover add max_shard_docs condition Rollover add max_primary_shard_docs condition Jan 26, 2022
@martijnvg martijnvg self-assigned this Feb 3, 2022
@martijnvg
Copy link
Member

@elasticmachine test this please

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @weizijun, I left a small comment.

@martijnvg
Copy link
Member

The resize API have a max_primary_shard_size parameter, do I also need to add the max_primary_shard_docs parameter to the resize API? Now I don't add the max_primary_shard_docs. If it needed to add the parameter to the resize API , I will continue to complete the code.

I don't this is needed for now. Maybe when needed, this can be added via a different PR.
Also apologies for the delay in responding to this PR.

@weizijun
Copy link
Contributor Author

weizijun commented Feb 3, 2022

I don't this is needed for now. Maybe when needed, this can be added via a different PR.
Also apologies for the delay in responding to this PR.

It's ok, I guess you are busy doing data stream with TSDB, and this week is our Chinese Spring Festival vacation. I will update the code and find the reason of failed tests.

@martijnvg
Copy link
Member

and this week is our Chinese Spring Festival vacation.

No need to update this PR during your vacation. Enjoy your time off 👍.

@weizijun
Copy link
Contributor Author

weizijun commented Feb 3, 2022

No need to update this PR during your vacation. Enjoy your time off 👍.

Thanks, I will fixed it next week.

@weizijun
Copy link
Contributor Author

weizijun commented Feb 7, 2022

@elasticmachine update branch

@martijnvg
Copy link
Member

@elasticmachine test this please

@weizijun
Copy link
Contributor Author

weizijun commented Feb 7, 2022

@martijnvg All checks have passed, and I will resolve the conflicting file.

* upstream/master:
  [DOCS] Switch xrefs to external links (elastic#83590)
  [DOCS] 'features' flag added in elastic#83083 (elastic#83452)
  Rename ChangePolicyforIndexIT to ChangePolicyForIndexIT (elastic#83569)
  Fixing random_sampler tests (elastic#83549)
  Upgrade Checkstyle to 9.3 (elastic#83314)
  Make improvements to the release notes generator (elastic#83525)
  Cleanup DataTierAllocationDecider (elastic#83572)
  Upgrade jANSI dependency to 2.4.0 (elastic#83566)
  Speed up Name Collision Check in Metadata.Builder (elastic#83340)
  SQL: Add range checks to interval multiplication operation (elastic#83478)
  Remove DiscoveryNodes#getAllNodes (elastic#83538)
  Make RoutingNodes behave like a collection (elastic#83540)
  Remove Unused CS Listener from SecurityServerTransportInterceptor (elastic#83556)
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a small docs comment, otherwise this LGTM 👍.

@martijnvg
Copy link
Member

@elasticmachine update branch

@martijnvg
Copy link
Member

@elasticmachine test this please

@martijnvg martijnvg merged commit 79132ed into elastic:master Feb 8, 2022
@martijnvg
Copy link
Member

Thanks for contributing this change @weizijun!

@weizijun
Copy link
Contributor Author

weizijun commented Feb 8, 2022

Thanks for contributing this change @weizijun!

Thanks @martijnvg for your review!

weizijun added a commit to weizijun/elasticsearch that referenced this pull request Feb 9, 2022
* upstream/master: (166 commits)
  Bind host all instead of just _site_ when needed (elastic#83145)
  [DOCS] Fix min/max agg snippets for histograms (elastic#83695)
  [DOCS] Add deprecation notice for system indices (elastic#83688)
  Cache ILM policy name on IndexMetadata (elastic#83603)
  [DOCS] Fix 8.0 breaking changes sort order (elastic#83685)
  [ML] fix random sampling background query consistency (elastic#83676)
  Move internal APIs into their own namespace '_internal'
  Runtime fields core-with-mapped tests support tsdb (elastic#83577)
  Optimize calculating the presence of a quorum (elastic#83638)
  Use switch expressions in EnableAllocationDecider and NodeShutdownAllocationDecider (elastic#83641)
  Note libffi error message in tmpdir docs (elastic#83662)
  Fix TransportDesiredNodesActionsIT batch tests (elastic#83659)
  [DOCS] Remove unused upgrade doc files (elastic#83617)
  [ML] Wait for model process to stop in stop deployment (elastic#83644)
  [ML] Fix submit after shutdown in process worker service (elastic#83645)
  Remove req/resp classes associated with HLRC (elastic#83599)
  Introduce index.version.compatibility setting (elastic#83264)
  Rename InternalTestCluster#getMasterNodeInstance (elastic#83407)
  Mute TimeSeriesIndexSearcherTests testCollectInOrderAcrossSegments (elastic#83648)
  Add rollover add max_primary_shard_docs condition (elastic#80981)
  ...

# Conflicts:
#	x-pack/plugin/rollup/build.gradle
#	x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/v2/RollupActionSingleNodeTests.java
@weizijun weizijun deleted the rollover-add-max_shard_docs branch April 28, 2022 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants