Skip to content

Track evacuated IDs since the last shard-local refresh in LiveVersionMap #95331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 4, 2023

Conversation

pxsalehi
Copy link
Member

@pxsalehi pxsalehi commented Apr 18, 2023

This PR adds adds an interface and noop implementation for LiveVersionMapArchive.
It receives old maps from the LiveVersionMap upon a refresh, and is also informed
when unpromotable shards have been refreshed (one way of doing this is implemented
in the matching Serverless PR).

relates ES-5728

@pxsalehi
Copy link
Member Author

pxsalehi commented Apr 18, 2023

For now please ignore correctness/concurrency issues. I'm wondering if I could create and pass around UnpromotableRefresher entirely from within the stateless plugin.

@Tim-Brooks
Copy link
Contributor

I'm wondering if I could create and pass around UnpromotableRefresher entirely from within the stateless plugin.

You can create overridable methods that only IndexEngine implements.

I added this PR as a discussion point for the sync tomorrow. I spent some time looking at it, but still thinking through the approach. I think we should all discuss it.

@pxsalehi pxsalehi force-pushed the ps230417-LiveVersionMap-archiver branch from 0ed09aa to f421076 Compare April 19, 2023 09:37
@pxsalehi
Copy link
Member Author

You can create overridable methods that only IndexEngine implements.

I meant more how to wire-up and pass along an UnpromotableRefresher that is NOT a no-op, entirely from Stateless.java. It would need to be able to send UnpromotableShardRefreshRequests and have access to the IndexShard. I think for that we need to:

  • Make TransportUnpromotableShardRefreshAction an action that can be called using a NodeClient.
  • Create the stateless UnpromotableRefresher implementation and pass it to IndexEngine by giving it the required IndexShard using an IndexEventListener maybe.

At the moment this is not urgent but I think if possible, this would cleanup a lot of noise here and move it to the Serverless PR where these things are actually used/needed.

@pxsalehi pxsalehi force-pushed the ps230417-LiveVersionMap-archiver branch from f421076 to e04e324 Compare April 19, 2023 14:00
@pxsalehi pxsalehi marked this pull request as ready for review April 20, 2023 15:14
@pxsalehi pxsalehi added >non-issue :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 20, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Apr 20, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@pxsalehi pxsalehi requested review from tlrx and removed request for DaveCTurner April 20, 2023 15:14
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments after an initial skimming of central pieces.

@pxsalehi pxsalehi force-pushed the ps230417-LiveVersionMap-archiver branch from 233412c to 83e84d6 Compare April 26, 2023 13:20
@pxsalehi pxsalehi force-pushed the ps230417-LiveVersionMap-archiver branch from 83e84d6 to 638b3db Compare April 26, 2023 16:49
Copy link
Member Author

@pxsalehi pxsalehi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henningandersen @tlrx This is now ready for review again. I've removed most of the clutter.

@pxsalehi pxsalehi requested review from henningandersen and arteam and removed request for kingherc April 26, 2023 16:57
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much leaner, I am largely on board but have not gone in depth yet. I have some first comments about the structure and interaction between server and stateless (also in the companion PR).

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment around safety.

Copy link
Member Author

@pxsalehi pxsalehi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henningandersen I've addressed the clean ups you suggested. Please take another look.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I wonder if we can add some minimal testing of the archive interaction in :server: ? Perhaps something like:

  1. Validate that an override of createLiveVersionMapArchiver results in a LiveVersionMap with the archive in it.
  2. Validate that beforeRefresh does not call it and afterRefresh does call it.
  3. Validate that an archive with some value in it is found when using the live-version map.

@pxsalehi pxsalehi force-pushed the ps230417-LiveVersionMap-archiver branch from 2b6b21d to 6e169b2 Compare April 28, 2023 12:40
@pxsalehi
Copy link
Member Author

@henningandersen I've addressed the changes.

I wonder if we can add some minimal testing of the archive interaction in :server: ?

hmm... I find that a bit of an overkill. I looked into that a bit and I think it would probably be a test with a bunch of awkward setups and trivial assertions. If you think that would be crucial for the PR, I can do that. I've tested all of those things in the serverless PR, where I think it actually matters.

@henningandersen
Copy link
Contributor

@henningandersen I've addressed the changes.

I wonder if we can add some minimal testing of the archive interaction in :server: ?

hmm... I find that a bit of an overkill. I looked into that a bit and I think it would probably be a test with a bunch of awkward setups and trivial assertions. If you think that would be crucial for the PR, I can do that. I've tested all of those things in the serverless PR, where I think it actually matters.

I'd like to have it. I think an important mechanism like this deserves some basic testing here. The effort does not sound enormous to me. I get that there is limited value, but not breaking the contract and knowing that early is beneficial. And I fear that the large separation of test from the code here will cause them to not be updated in concert moving forward. If some of it requires big efforts I am happy to omit it, but can you give it a shot and we can chat about it if you think it is too big?

@pxsalehi
Copy link
Member Author

pxsalehi commented May 2, 2023

@henningandersen I've added the requested tests, please have another look!

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@pxsalehi pxsalehi added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) and removed auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels May 4, 2023
@pxsalehi pxsalehi merged commit e70020d into elastic:main May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants