Skip to content

Make sure listener is resolved when file queue is cleared #89929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 9, 2022

Conversation

pxsalehi
Copy link
Member

@pxsalehi pxsalehi commented Sep 8, 2022

Handle a bug introduced in #88209 while refactoring how file upload tasks run in a shard snapshot. The corner case where the queue of files to snapshot gets cleared when a file snapshot runs into an exception is not addressed in that PR.

Closes #89927
Closes #89956

@pxsalehi pxsalehi added the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Sep 8, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 8, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @pxsalehi, I've created a changelog YAML for you.

@pxsalehi
Copy link
Member Author

pxsalehi commented Sep 8, 2022

I would like to also somehow refactor that part a bit, and add a test for this corner case. Whether here or in another PR. Since there is a failing test, we might want to merge this. I have added a unit test for this which fails consistently w/o the fix (https://github.com/pxsalehi/elasticsearch/tree/ps-220908-reprod-cleared-queue has the test but not the fix).

@pxsalehi pxsalehi added >test Issues or PRs that are addressing/adding tests >bug and removed >bug labels Sep 8, 2022
@pxsalehi pxsalehi force-pushed the ps-220908-handle-cleared-queue branch from e8ab227 to d9c3ca6 Compare September 8, 2022 15:03
Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this so quickly @pxsalehi ! The fix looks fine, just one complaint about a test detail :)

files.add(ShardSnapshotTaskRunnerTests.dummyFileInfo());
}
repository.snapshotFiles(context, files, allFilesUploadListener);
assertBusy(() -> assertTrue(listenerCalled.get()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of dislike this pattern in tests, it causes test to be needlessly slow in aggregate because of the sleeps in the busy polling from assertTrue. Maybe just use a PlainActionFuture for the listener above and then wait for its completion here?

@pxsalehi pxsalehi merged commit 7164e35 into elastic:main Sep 9, 2022
@pxsalehi
Copy link
Member Author

pxsalehi commented Sep 9, 2022

Thanks, Armin!

weizijun added a commit to weizijun/elasticsearch that referenced this pull request Sep 9, 2022
* main: (176 commits)
  Fix RandomSamplerAggregatorTests testAggregationSamplingNestedAggsScaled test failure (elastic#89958)
  [Downsampling] Replace document map with SMILE encoded doc (elastic#89495)
  Remove full cluster state from error logging in MasterService (elastic#89960)
  [ML] Truncate categorization fields (elastic#89827)
  [TSDB] Removed `summary` and `histogram` metric types (elastic#89937)
  Update testNodeSelectorRouting so that it does not depend on iteration order (elastic#89879)
  Make sure listener is resolved when file queue is cleared (elastic#89929)
  [Stable plugin api] Extensible annotation (elastic#89903)
  Fix double sending of response in TransportOpenIdConnectPrepareAuthenticationAction (elastic#89930)
  Make sure ivy repo directory exists before downloading artifacts
  Use 'file://' scheme for local repository URL
  Use DRA artifacts for release build CI jobs
  Log unsuccessful attempts to get credentials from web identity tokens (elastic#88241)
  Script: Write Field API path manipulation (elastic#89889)
  Fetch health info action (elastic#89820)
  Fix memory leak in TransportDeleteExpiredDataAction (elastic#89935)
  [ML] Performance improvements for categorization jobs (elastic#89824)
  [DOCS] Revert changes for ES_JAVA_OPTS (elastic#89931)
  Fix deadlock bug exposed by a test (elastic#89934)
  [Downsampling] Remove `FieldValueFetcher` validator (elastic#89497)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v8.5.0
Projects
None yet
3 participants