Skip to content

Add REST API for cache directory stats #51815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Feb 3, 2020

Note: this pull request targets the feature/searchable-snapshots branch

This pull request adds a REST API that exposes the various CacheDirectory stats added in #51637. It adds the necessary action, transport action and request and response objects as well as a new qa:rest project for REST tests.

The REST endpoint is _searchable_snapshots/stats (as a TransportNodesAction it can be filtered by nodes ids) and the response looks like:

{
           "_shards" : {
             "total" : 2,
             "successful" : 1,
             "failed" : 0
           },
           "indices" : {
             "index" : {
               "shards" : {
                 "0" : [
                   {
                     "snapshot_uuid" : "CJ95kbqcSYCu5wBAiUDDNA",
                     "index_uuid" : "dllhcZ7kSai9XeBhVjEhIg",
                     "shard" : {
                       "state" : "STARTED",
                       "primary" : true,
                       "node" : "Zpw8ZgeqT9mCmmfRA1feNw"
                     },
                     "files" : [
                       {
                         "name" : "_0.cfe",
                         "length" : 405,
                         "open_count" : 6,
                         "inner_count" : 1,
                         "close_count" : 6,
                         "contiguous_bytes_read" : {
                           "count" : 5,
                           "sum" : 2025,
                           "min" : 405,
                           "max" : 405
                         },
                         "non_contiguous_bytes_read" : {
                           "count" : 1,
                           "sum" : 16,
                           "min" : 16,
                           "max" : 16
                         },
                         "cached_bytes_read" : {
                           "count" : 6,
                           "sum" : 2041,
                           "min" : 16,
                           "max" : 405
                         },
                         "cached_bytes_written" : {
                           "count" : 1,
                           "sum" : 405,
                           "min" : 405,
                           "max" : 405
                         },
                         "direct_bytes_read" : {
                           "count" : 0,
                           "sum" : 0,
                           "min" : 0,
                           "max" : 0
                         },
                         "forward_seeks" : {
                           "small" : {
                             "count" : 0,
                             "sum" : 0,
                             "min" : 0,
                             "max" : 0
                           },
			   "large" : {
                             "count" : 0,
                             "sum" : 0,
                             "min" : 0,
                             "max" : 0
                           }
                         },
                         "backward_seeks" : {
                           "small" : {
                             "count" : 0,
                             "sum" : 0,
                             "min" : 0,
                             "max" : 0
                           },
                           "large" : {
                             "count" : 0,
                             "sum" : 0,
                             "min" : 0,
                             "max" : 0
                           }
                         }
                       },

               ...
}

@tlrx tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Feb 3, 2020
@tlrx tlrx requested a review from DaveCTurner February 3, 2020 14:46
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I get too far into this, are we sure about this being a TransportNodesAction and not a TransportBroadcastByNodeAction? I think the response from even a single node might be overwhelming if it holds a lot of indices, given the level of detail, and it would be better to be able to get the stats from a single index across the whole cluster.

@tlrx
Copy link
Member Author

tlrx commented Feb 4, 2020

I think the response from even a single node might be overwhelming if it holds a lot of indices, given the level of detail, and it would be better to be able to get the stats from a single index across the whole cluster.

Thanks @DaveCTurner. That sounds like the right thing to do, indeed. I'll update the PR.

@tlrx
Copy link
Member Author

tlrx commented Feb 4, 2020

@DaveCTurner I've updated the code so that the transport action is now a TransportBroadcastByNodeAction. It allows to gather stats for a given index on all nodes and I've updated the PR description with the new response format (I changed it so that it's more similar to other APIs like Indices Segment API).

If it appears to be a need we could also aggregate stats per index, per shard or per file but I haven't done this as it complicates things for now. We could change this later.

@tlrx tlrx requested a review from DaveCTurner February 4, 2020 12:44
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left only minor comments and questions.

"documentation": {
"url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/searchable-snapshots-get-stats.html //NORELEASE"
},
"stability": "experimental",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least mark this line as //NORELEASE too, but really I think we should be bolder and say that by the time this is merged we expect this API not to be experimental any more, so there's no need for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Sadly we can't mark this line as //NORELEASE too as the stability is checked against a specified set of values. We can't also comment this JSON file.

I think that experimental reflects the current state of this API for now. I hooked on the //NORELEASE and add more documentation about the stability of the API (see fde0a86).

}
}
}
return state.routingTable().allShards(searchableSnapshotIndices.toArray(new String[0]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would throw an exception if the user specified an index (or pattern) which didn't match any searchable snapshots, but this would naturally fall to the IndexNameExpressionResolver which doesn't seem to have a suitable extension point for this kind of logic.

However, what do you think about handling at least the case of a user specifying a single index with a typo, matching nothing, with an INFE? I.e. if request.indices() is not empty or ["*"] or ["_all"] but searchableSnapshotIndices is empty, then that's a bad request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea. I implemented it a bit differently though in c9468d1, by checking if one or more concrete indices were resolved (taking security into account) but none of them had a searchable store type (searchableSnapshotIndices is empty) which throws a resource not found exception. I've updated the REST API test for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this means that GET _searchable_snapshots/stats returns 200 OK in an empty cluster but 404 Not Found if there are any indices at all but none of them are searchable snapshots. I think that'll be surprising. Maybe it'd be best always to throw a RNFE if searchableSnapshotIndices is empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully sure how it should behave, also because Security also has its specific behavior so I went with your suggestion in d1049a3 which has the advantage to be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Test failures are not obviously related, maybe we need to merge something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, need to merge master but I'm waiting for bwc to be reenabled before.

@tlrx tlrx requested a review from DaveCTurner February 5, 2020 11:02
@tlrx
Copy link
Member Author

tlrx commented Feb 5, 2020

@DaveCTurner I've updated the PR with your feedback. Let me know if you have more feedback, thanks!

@tlrx
Copy link
Member Author

tlrx commented Feb 5, 2020

@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/default-distro

@tlrx
Copy link
Member Author

tlrx commented Feb 6, 2020

@elasticmachine update branch

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlrx tlrx merged commit c73cf68 into elastic:feature/searchable-snapshots Feb 6, 2020
@tlrx tlrx deleted the add-instrumentation-step-2 branch February 6, 2020 13:05
@tlrx
Copy link
Member Author

tlrx commented Feb 6, 2020

Thanks David!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants