Skip to content

Add stats for time spent fetching data while searching snapshots #51866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Feb 4, 2020

This commit builds on #51637, adding tracking of the total time spent fetching
data from the blob store.

Relates #50999.

This commit builds on elastic#51637, adding tracking of the total time spent fetching
data from the blob store and a linear regression model for these fetches.
@DaveCTurner DaveCTurner added WIP :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Feb 4, 2020
@DaveCTurner DaveCTurner requested review from tlrx and ywelsch February 4, 2020 14:19
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@DaveCTurner
Copy link
Contributor Author

I'm not sure the linear regression is actually useful here. Since we're tracking stats on a file-by-file basis, essentially every fetch will be the same size, which stops us from building a meaningful model of the fetch time as a function of size. Maybe we just want to track the total took time? Possibly min and max too?

@tlrx
Copy link
Member

tlrx commented Feb 6, 2020

I tend to agree with you, unless we implement different range sizes per files (which we could do easily) I'm not sure the linear regression is very useful. Maybe we could reuse the existing counters and track the total/min/max took times as you suggested.

@DaveCTurner DaveCTurner removed the WIP label Feb 24, 2020
Copy link
Contributor Author

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated following discussions on other channels; this is ready for a proper review now.

@@ -106,7 +106,7 @@ public void onRepositoriesModule(RepositoriesModule repositoriesModule) {
@Override
public Map<String, DirectoryFactory> getDirectoryFactories() {
return Map.of(SearchableSnapshotRepository.SNAPSHOT_DIRECTORY_FACTORY_KEY,
SearchableSnapshotRepository.newDirectoryFactory(repositoriesService::get, cacheService::get));
SearchableSnapshotRepository.newDirectoryFactory(repositoriesService::get, cacheService::get, System::nanoTime));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using System::nanoTime since we need finer resolution than ThreadPool::relativeTimeInNanos offers.

@DaveCTurner DaveCTurner changed the title Add stats for fetch speed as a function of size Add stats for time spent fetching data while searching snapshots Feb 24, 2020
@DaveCTurner DaveCTurner requested a review from tlrx February 24, 2020 13:52
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks David!

@DaveCTurner DaveCTurner merged commit c9ac57f into elastic:feature/searchable-snapshots Feb 24, 2020
@DaveCTurner DaveCTurner deleted the 2020-02-04-searchable-snapshots-track-time-spent-fetching-from-blob-store branch February 24, 2020 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants