-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Rollover add max_primary_shard_docs condition #80981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rollover add max_primary_shard_docs condition #80981
Conversation
@weizijun can you explain why you think this addition would be useful versus the regular |
@dakrone I think there are the lists benefit:
|
This is one of the reasons we introduced |
@dakrone the shard_size is different from the document fields count and the field's type, but the search performance is depend on the document counts. In time_series case, the shard is small, but the counts is large, in our case about 40 million documents, the shard size is about 4gb, after our improvement with compressing the docvalues, the shard size is more smaller. so we want to set the document count limit for the index. and we want to set a default policy for all indices, to limit the index's max_shard_docs |
@dakrone is the feature necessary to develop? if it's ok, I will continue to develop the feature. I prefer the feature, as the shard doc count can also affect query performance |
@weizijun I'll mark this for discussion, I don't think I have a good use case for it, but there may be something else I'm missing. |
ok, thanks, when you have conclusion, I will continue to develop. |
hi, @dakrone @martijnvg , how is the discussion going on? is it necessary continue to develop? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new condition makes sense, when dealing with data sources that have many documents that are small in bytes on disk (which is the case for time series data sources). I left two comments as an initial review.
@@ -26,11 +26,12 @@ | |||
private static final ParseField MAX_PRIMARY_SHARD_SIZE_FIELD = new ParseField("max_primary_shard_size"); | |||
private static final ParseField MAX_AGE_FIELD = new ParseField("max_age"); | |||
private static final ParseField MAX_DOCS_FIELD = new ParseField("max_docs"); | |||
private static final ParseField MAX_SHARD_DOCS_FIELD = new ParseField("max_shard_docs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is needed to add this new limit to the HLRC code base.
This HLRC is no longer developed in favour for the new java client.
This just exists in master/8.0 for tests that still rely on hlrc,
but there should be no need to add new things to the HLRC.
The HLRC is no longer released / published and will be removed in the near future.
So I think we can undo all changes in the client directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, I will revert them.
@@ -81,6 +81,14 @@ replicas are ignored. | |||
TIP: To see the current shard size, use the <<cat-shards, _cat shards>> API. | |||
The `store` value shows the size each shard, and `prirep` indicates whether a | |||
shard is a primary (`p`) or a replica (`r`). | |||
|
|||
`max_shard_docs`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better name would be max_single_shard_docs?
Since this condition will apply if any shard reach this limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this condition will apply if any shard reach this limit?
yeah, any shard that reaches the limit will cause a rollover.
I'm not sure, which one of the following is better:
- max_shard_docs
- max_single_shard_docs
- max_primary_shard_docs
We can discuss and make a decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe max_primary_shard_docs
is better here? Given how the doc count is computed in TransportRolloverAction#buildStats(...)
. In this code non primary shards are filtered, just like this
code is already doing when computing maxPrimaryShardSize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I also prefer max_primary_shard_docs
, I will change the name to max_primary_shard_docs
thank you, I will continue to complete the code. |
06af3ef
to
2939d24
Compare
* upstream/master: (762 commits) [DOCS] Add note to that log4j customization is outside the support scope (elastic#82668) Batch Index Settings Update Requests (elastic#82896) [DOCS] Delete pipeline containing stored script (elastic#83102) Try again to fix changelog areas after reorg (elastic#83100) Bind to non-localhost for transport in some cases (elastic#82973) [DOCS] Reuse multi-level `join` warning (elastic#82976) Remove unnecessary CopyOnWriteHashMap class (elastic#83040) Adjust changelog categories after reorg (elastic#83087) [DOCS] Fix typo in `action.destructive_requires_name` breaking change (elastic#83085) Stack Monitoring: Add Enterprise Search monitoring index templates (elastic#82743) [DOCS] Fix stored script example snippet (elastic#83056) [DOCS] Re-add network traffic para to `term` query (elastic#83047) [DOCS] Rename example stored script (elastic#83054) [ML][DOCS] Add Trained model APIs to the REST APIs index (elastic#82791) [ML] Update running process when global calendar changes (elastic#83044) [Transform] Fix condition on which the transform stops processing buckets (elastic#82852) [DOCS] Fixes field names in ML sum functions. (elastic#83048) [ML] fix NLP tokenization never_split handling around punctuation (elastic#82982) Construct dynamic updates directly via object builders (elastic#81449) Emit trace.id into audit logs (elastic#82849) ... # Conflicts: # client/rest-high-level/src/test/java/org/elasticsearch/client/IndicesClientIT.java # client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/ILMDocumentationIT.java # server/src/main/java/org/elasticsearch/action/admin/indices/rollover/Condition.java # server/src/test/java/org/elasticsearch/action/admin/indices/rollover/ConditionTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/RolloverActionTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/TimeseriesLifecycleTypeTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/WaitForRolloverReadyStepTests.java
@elasticmachine test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good @weizijun, I left a small comment.
...rc/yamlRestTest/resources/rest-api-spec/test/indices.rollover/25_max_shard_doc_condition.yml
Outdated
Show resolved
Hide resolved
I don't this is needed for now. Maybe when needed, this can be added via a different PR. |
It's ok, I guess you are busy doing data stream with TSDB, and this week is our Chinese Spring Festival vacation. I will update the code and find the reason of failed tests. |
No need to update this PR during your vacation. Enjoy your time off 👍. |
Thanks, I will fixed it next week. |
@elasticmachine update branch |
@elasticmachine test this please |
@martijnvg All checks have passed, and I will resolve the conflicting file. |
* upstream/master: [DOCS] Switch xrefs to external links (elastic#83590) [DOCS] 'features' flag added in elastic#83083 (elastic#83452) Rename ChangePolicyforIndexIT to ChangePolicyForIndexIT (elastic#83569) Fixing random_sampler tests (elastic#83549) Upgrade Checkstyle to 9.3 (elastic#83314) Make improvements to the release notes generator (elastic#83525) Cleanup DataTierAllocationDecider (elastic#83572) Upgrade jANSI dependency to 2.4.0 (elastic#83566) Speed up Name Collision Check in Metadata.Builder (elastic#83340) SQL: Add range checks to interval multiplication operation (elastic#83478) Remove DiscoveryNodes#getAllNodes (elastic#83538) Make RoutingNodes behave like a collection (elastic#83540) Remove Unused CS Listener from SecurityServerTransportInterceptor (elastic#83556)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a small docs comment, otherwise this LGTM 👍.
@elasticmachine update branch |
@elasticmachine test this please |
Thanks for contributing this change @weizijun! |
Thanks @martijnvg for your review! |
* upstream/master: (166 commits) Bind host all instead of just _site_ when needed (elastic#83145) [DOCS] Fix min/max agg snippets for histograms (elastic#83695) [DOCS] Add deprecation notice for system indices (elastic#83688) Cache ILM policy name on IndexMetadata (elastic#83603) [DOCS] Fix 8.0 breaking changes sort order (elastic#83685) [ML] fix random sampling background query consistency (elastic#83676) Move internal APIs into their own namespace '_internal' Runtime fields core-with-mapped tests support tsdb (elastic#83577) Optimize calculating the presence of a quorum (elastic#83638) Use switch expressions in EnableAllocationDecider and NodeShutdownAllocationDecider (elastic#83641) Note libffi error message in tmpdir docs (elastic#83662) Fix TransportDesiredNodesActionsIT batch tests (elastic#83659) [DOCS] Remove unused upgrade doc files (elastic#83617) [ML] Wait for model process to stop in stop deployment (elastic#83644) [ML] Fix submit after shutdown in process worker service (elastic#83645) Remove req/resp classes associated with HLRC (elastic#83599) Introduce index.version.compatibility setting (elastic#83264) Rename InternalTestCluster#getMasterNodeInstance (elastic#83407) Mute TimeSeriesIndexSearcherTests testCollectInOrderAcrossSegments (elastic#83648) Add rollover add max_primary_shard_docs condition (elastic#80981) ... # Conflicts: # x-pack/plugin/rollup/build.gradle # x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/v2/RollupActionSingleNodeTests.java
Rollover now support four condition:
Add a new condition named
max_primary_shard_docs
.Triggers rollover when the largest shard in the index reaches a certain number of documents.
This is the maximum docs of the shards in the index. As with
max_docs
.