-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fix deadlock bug exposed by a test #89934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A new test exposed a very rare bug where the file settings service was in the middle of processing the file when the node closed. This terminated the cluster state update task, but nobody unlocked the latch await. The fix allows the stop operation to properly terminate the watcher thread. Unit tests added that expose the bug.
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Hi @grcevski, I've created a changelog YAML for you. |
…ticsearch into bug/file_settings_shutdown
ChrisHegarty
approved these changes
Sep 8, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks Chris! |
grcevski
added a commit
to grcevski/elasticsearch
that referenced
this pull request
Sep 8, 2022
A new test exposed a very rare bug where the file settings service was in the middle of processing the file when the node closed. This terminated the cluster state update task, but nobody unlocked the latch await. The fix allows the stop operation to properly terminate the watcher thread.
💚 Backport successful
|
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 8, 2022
A new test exposed a very rare bug where the file settings service was in the middle of processing the file when the node closed. This terminated the cluster state update task, but nobody unlocked the latch await. The fix allows the stop operation to properly terminate the watcher thread.
weizijun
added a commit
to weizijun/elasticsearch
that referenced
this pull request
Sep 9, 2022
* main: (34 commits) Make sure ivy repo directory exists before downloading artifacts Use 'file://' scheme for local repository URL Use DRA artifacts for release build CI jobs Log unsuccessful attempts to get credentials from web identity tokens (elastic#88241) Script: Write Field API path manipulation (elastic#89889) Fetch health info action (elastic#89820) Fix memory leak in TransportDeleteExpiredDataAction (elastic#89935) [ML] Performance improvements for categorization jobs (elastic#89824) [DOCS] Revert changes for ES_JAVA_OPTS (elastic#89931) Fix deadlock bug exposed by a test (elastic#89934) [Downsampling] Remove `FieldValueFetcher` validator (elastic#89497) Fix segment stats in tsdb (elastic#89754) Synthetic _source: support dense_vector (elastic#89840) REST tests fetching fields with synthetic _source (elastic#89888) Do not deserialize back BytesTransportRequest to clone a request in MockTransportService (elastic#89926) Add SDK request logging to debug failures of S3BlobStoreRepositoryTests#testRequestStats (elastic#89912) Fix SnapshotStatusApisIT.testGetSnapshotsWithSnapshotInProgress (elastic#89925) Document synthetic source for text and keyword (elastic#89893) Fix CloneSnapshotIT.testRemoveFailedCloneFromCSWithQueuedSnapshotInProgress (elastic#89914) Add missing index.mapping.total_fields.limit setting to the target index (elastic#89875) ...
weizijun
added a commit
to weizijun/elasticsearch
that referenced
this pull request
Sep 9, 2022
* main: (176 commits) Fix RandomSamplerAggregatorTests testAggregationSamplingNestedAggsScaled test failure (elastic#89958) [Downsampling] Replace document map with SMILE encoded doc (elastic#89495) Remove full cluster state from error logging in MasterService (elastic#89960) [ML] Truncate categorization fields (elastic#89827) [TSDB] Removed `summary` and `histogram` metric types (elastic#89937) Update testNodeSelectorRouting so that it does not depend on iteration order (elastic#89879) Make sure listener is resolved when file queue is cleared (elastic#89929) [Stable plugin api] Extensible annotation (elastic#89903) Fix double sending of response in TransportOpenIdConnectPrepareAuthenticationAction (elastic#89930) Make sure ivy repo directory exists before downloading artifacts Use 'file://' scheme for local repository URL Use DRA artifacts for release build CI jobs Log unsuccessful attempts to get credentials from web identity tokens (elastic#88241) Script: Write Field API path manipulation (elastic#89889) Fetch health info action (elastic#89820) Fix memory leak in TransportDeleteExpiredDataAction (elastic#89935) [ML] Performance improvements for categorization jobs (elastic#89824) [DOCS] Revert changes for ES_JAVA_OPTS (elastic#89931) Fix deadlock bug exposed by a test (elastic#89934) [Downsampling] Remove `FieldValueFetcher` validator (elastic#89497) ...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Core/Infra/Core
Core issues without another label
Team:Core/Infra
Meta label for core/infra team
v8.4.2
v8.5.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A new test exposed a very rare bug where the file settings service was in the middle of processing the file when the node closed. This terminated the cluster state update task, but nobody unlocked the latch await. The fix allows the stop operation to properly terminate the watcher thread.
Essentially, the cluster state tasks we run while processing the settings.json file, can terminate when the node is being shutdown, without calling the task success or failure methods. This means that if we are waiting for all async tasks to finish inside the file watcher processing loop, we'll be waiting forever. This problem was exposed while writing a new integration test for the repository settings.
#89601
Unit tests added that expose the bug consistently.