Skip to content

Commit 029cd94

Browse files
authored
Implement remote cluster CCS telemetry (#112478) (#113814)
* Add remote cluster stats to _cluster/stats * Implement remote cluster stats polling * Add docs for the include_remotes part (cherry picked from commit b26d81c)
1 parent 620e780 commit 029cd94

File tree

15 files changed

+939
-54
lines changed

15 files changed

+939
-54
lines changed

docs/reference/cluster/stats.asciidoc

+114-10
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,10 @@ If a node does not respond before its timeout expires, the response does not inc
4040
However, timed out nodes are included in the response's `_nodes.failed` property.
4141
Defaults to no timeout.
4242

43+
`include_remotes`::
44+
(Optional, Boolean) If `true`, includes remote cluster information in the response.
45+
Defaults to `false`, so no remote cluster information is returned.
46+
4347
[role="child_attributes"]
4448
[[cluster-stats-api-response-body]]
4549
==== {api-response-body-title}
@@ -183,12 +187,11 @@ This number is based on documents in Lucene segments and may include documents f
183187
This number is based on documents in Lucene segments. {es} reclaims the disk space of deleted Lucene documents when a segment is merged.
184188

185189
`total_size_in_bytes`::
186-
(integer)
187-
Total size in bytes across all primary shards assigned to selected nodes.
190+
(integer) Total size in bytes across all primary shards assigned to selected nodes.
188191

189192
`total_size`::
190-
(string)
191-
Total size across all primary shards assigned to selected nodes, as a human-readable string.
193+
(string) Total size across all primary shards assigned to selected nodes, as a human-readable string.
194+
192195
=====
193196
194197
`store`::
@@ -1285,8 +1288,7 @@ They are included here for expert users, but should otherwise be ignored.
12851288
====
12861289
12871290
`repositories`::
1288-
(object) Contains statistics about the <<snapshot-restore,snapshot>> repositories defined in the cluster, broken down
1289-
by repository type.
1291+
(object) Contains statistics about the <<snapshot-restore,snapshot>> repositories defined in the cluster, broken down by repository type.
12901292
+
12911293
.Properties of `repositories`
12921294
[%collapsible%open]
@@ -1314,13 +1316,74 @@ Each repository type may also include other statistics about the repositories of
13141316
[%collapsible%open]
13151317
=====
13161318
1319+
`clusters`:::
1320+
(object) Contains remote cluster settings and metrics collected from them.
1321+
The keys are cluster names, and the values are per-cluster data.
1322+
Only present if `include_remotes` option is set to `true`.
1323+
1324+
+
1325+
.Properties of `clusters`
1326+
[%collapsible%open]
1327+
======
1328+
1329+
`cluster_uuid`:::
1330+
(string) The UUID of the remote cluster.
1331+
1332+
`mode`:::
1333+
(string) The <<sniff-proxy-modes, connection mode>> used to communicate with the remote cluster.
1334+
1335+
`skip_unavailable`:::
1336+
(Boolean) The `skip_unavailable` <<skip-unavailable-clusters, setting>> used for this remote cluster.
1337+
1338+
`transport.compress`:::
1339+
(string) Transport compression setting used for this remote cluster.
1340+
1341+
`version`:::
1342+
(array of strings) The list of {es} versions used by the nodes on the remote cluster.
1343+
1344+
`status`:::
1345+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=cluster-health-status]
1346+
+
1347+
See <<cluster-health>>.
1348+
1349+
`nodes_count`:::
1350+
(integer) The total count of nodes in the remote cluster.
1351+
1352+
`shards_count`:::
1353+
(integer) The total number of shards in the remote cluster.
1354+
1355+
`indices_count`:::
1356+
(integer) The total number of indices in the remote cluster.
1357+
1358+
`indices_total_size_in_bytes`:::
1359+
(integer) Total data set size, in bytes, of all shards assigned to selected nodes.
1360+
1361+
`indices_total_size`:::
1362+
(string) Total data set size, in bytes, of all shards assigned to selected nodes, as a human-readable string.
1363+
1364+
`max_heap_in_bytes`:::
1365+
(integer) Maximum amount of memory, in bytes, available for use by the heap across the nodes of the remote cluster.
1366+
1367+
`max_heap`:::
1368+
(string) Maximum amount of memory, in bytes, available for use by the heap across the nodes of the remote cluster,
1369+
as a human-readable string.
1370+
1371+
`mem_total_in_bytes`:::
1372+
(integer) Total amount, in bytes, of physical memory across the nodes of the remote cluster.
1373+
1374+
`mem_total`:::
1375+
(string) Total amount, in bytes, of physical memory across the nodes of the remote cluster, as a human-readable string.
1376+
1377+
======
1378+
13171379
13181380
`_search`:::
1319-
(object) Contains the telemetry information about the <<modules-cross-cluster-search, {ccs}>> usage in the cluster.
1381+
(object) Contains the information about the <<modules-cross-cluster-search, {ccs}>> usage in the cluster.
13201382
+
13211383
.Properties of `_search`
13221384
[%collapsible%open]
13231385
======
1386+
13241387
`total`:::
13251388
(integer) The total number of {ccs} requests that have been executed by the cluster.
13261389

@@ -1336,6 +1399,7 @@ Each repository type may also include other statistics about the repositories of
13361399
.Properties of `took`
13371400
[%collapsible%open]
13381401
=======
1402+
13391403
`max`:::
13401404
(integer) The maximum time taken to execute a {ccs} request, in milliseconds.
13411405
@@ -1344,6 +1408,7 @@ Each repository type may also include other statistics about the repositories of
13441408
13451409
`p90`:::
13461410
(integer) The 90th percentile of the time taken to execute {ccs} requests, in milliseconds.
1411+
13471412
=======
13481413

13491414
`took_mrt_true`::
@@ -1361,6 +1426,7 @@ Each repository type may also include other statistics about the repositories of
13611426
13621427
`p90`:::
13631428
(integer) The 90th percentile of the time taken to execute {ccs} requests, in milliseconds.
1429+
13641430
=======
13651431

13661432
`took_mrt_false`::
@@ -1378,6 +1444,7 @@ Each repository type may also include other statistics about the repositories of
13781444
13791445
`p90`:::
13801446
(integer) The 90th percentile of the time taken to execute {ccs} requests, in milliseconds.
1447+
13811448
=======
13821449

13831450
`remotes_per_search_max`::
@@ -1391,9 +1458,10 @@ Each repository type may also include other statistics about the repositories of
13911458
The keys are the failure reason names and the values are the number of requests that failed for that reason.
13921459

13931460
`features`::
1394-
(object) Contains statistics about the features used in {ccs} requests. The keys are the names of the search feature,
1395-
and the values are the number of requests that used that feature. Single request can use more than one feature
1396-
(e.g. both `async` and `wildcard`). Known features are:
1461+
(object) Contains statistics about the features used in {ccs} requests.
1462+
The keys are the names of the search feature, and the values are the number of requests that used that feature.
1463+
Single request can use more than one feature (e.g. both `async` and `wildcard`).
1464+
Known features are:
13971465

13981466
* `async` - <<async-search, Async search>>
13991467

@@ -1427,6 +1495,7 @@ This may include requests where partial results were returned, but not requests
14271495
.Properties of `took`
14281496
[%collapsible%open]
14291497
========
1498+
14301499
`max`:::
14311500
(integer) The maximum time taken to execute a {ccs} request, in milliseconds.
14321501

@@ -1435,6 +1504,7 @@ This may include requests where partial results were returned, but not requests
14351504

14361505
`p90`:::
14371506
(integer) The 90th percentile of the time taken to execute {ccs} requests, in milliseconds.
1507+
14381508
========
14391509
14401510
=======
@@ -1812,3 +1882,37 @@ This API can be restricted to a subset of the nodes using <<cluster-nodes,node f
18121882
--------------------------------------------------
18131883
GET /_cluster/stats/nodes/node1,node*,master:false
18141884
--------------------------------------------------
1885+
1886+
This API call will return data about the remote clusters if any are configured:
1887+
1888+
[source,console]
1889+
--------------------------------------------------
1890+
GET /_cluster/stats?include_remotes=true
1891+
--------------------------------------------------
1892+
1893+
The resulting response will contain the `ccs` object with information about the remote clusters:
1894+
1895+
[source,js]
1896+
--------------------------------------------------
1897+
{
1898+
"ccs": {
1899+
"clusters": {
1900+
"remote_cluster": {
1901+
"cluster_uuid": "YjAvIhsCQ9CbjWZb2qJw3Q",
1902+
"mode": "sniff",
1903+
"skip_unavailable": false,
1904+
"transport.compress": "true",
1905+
"version": ["8.16.0"],
1906+
"status": "green",
1907+
"nodes_count": 10,
1908+
"shards_count": 420,
1909+
"indices_count": 10,
1910+
"indices_total_size_in_bytes": 6232658362,
1911+
"max_heap_in_bytes": 1037959168,
1912+
"mem_total_in_bytes": 137438953472
1913+
}
1914+
}
1915+
}
1916+
}
1917+
--------------------------------------------------
1918+
// TESTRESPONSE[skip:TODO]

rest-api-spec/src/main/resources/rest-api-spec/api/cluster.stats.json

+2-2
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,9 @@
3232
]
3333
},
3434
"params":{
35-
"flat_settings":{
35+
"include_remotes":{
3636
"type":"boolean",
37-
"description":"Return settings in flat format (default: false)"
37+
"description":"Include remote cluster data into the response (default: false)"
3838
},
3939
"timeout":{
4040
"type":"time",
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
---
2+
"cross-cluster search stats basic":
3+
- requires:
4+
test_runner_features: [ capabilities ]
5+
capabilities:
6+
- method: GET
7+
path: /_cluster/stats
8+
capabilities:
9+
- "ccs-stats"
10+
reason: "Capability required to run test"
11+
12+
- do:
13+
cluster.stats: { }
14+
15+
- is_true: ccs
16+
- is_true: ccs._search
17+
- is_false: ccs.clusters # no ccs clusters configured
18+
- exists: ccs._search.total
19+
- exists: ccs._search.success
20+
- exists: ccs._search.skipped
21+
- is_true: ccs._search.took
22+
- is_true: ccs._search.took_mrt_true
23+
- is_true: ccs._search.took_mrt_false
24+
- exists: ccs._search.remotes_per_search_max
25+
- exists: ccs._search.remotes_per_search_avg
26+
- exists: ccs._search.failure_reasons
27+
- exists: ccs._search.features
28+
- exists: ccs._search.clients
29+
- exists: ccs._search.clusters
30+
31+
---
32+
"cross-cluster search stats search":
33+
- requires:
34+
test_runner_features: [ capabilities ]
35+
capabilities:
36+
- method: GET
37+
path: /_cluster/stats
38+
capabilities:
39+
- "ccs-stats"
40+
reason: "Capability required to run test"
41+
42+
- do:
43+
cluster.state: {}
44+
- set: { master_node: master }
45+
- do:
46+
nodes.info:
47+
metric: [ http, transport ]
48+
- set: {nodes.$master.http.publish_address: host}
49+
- set: {nodes.$master.transport.publish_address: transport_host}
50+
51+
- do:
52+
cluster.put_settings:
53+
body:
54+
persistent:
55+
cluster:
56+
remote:
57+
cluster_one:
58+
seeds:
59+
- "${transport_host}"
60+
skip_unavailable: true
61+
cluster_two:
62+
seeds:
63+
- "${transport_host}"
64+
skip_unavailable: false
65+
- is_true: persistent.cluster.remote.cluster_one
66+
67+
- do:
68+
indices.create:
69+
index: test
70+
body:
71+
settings:
72+
number_of_replicas: 0
73+
74+
- do:
75+
index:
76+
index: test
77+
id: "1"
78+
refresh: true
79+
body:
80+
foo: bar
81+
82+
- do:
83+
cluster.health:
84+
wait_for_status: green
85+
86+
- do:
87+
search:
88+
index: "*,*:*"
89+
body:
90+
query:
91+
match:
92+
foo: bar
93+
94+
- do:
95+
cluster.stats: {}
96+
- is_true: ccs
97+
- is_true: ccs._search
98+
- is_false: ccs.clusters # Still no remotes since include_remotes is not set
99+
100+
- do:
101+
cluster.stats:
102+
include_remotes: true
103+
- is_true: ccs
104+
- is_true: ccs._search
105+
- is_true: ccs.clusters # Now we have remotes
106+
- is_true: ccs.clusters.cluster_one
107+
- is_true: ccs.clusters.cluster_two
108+
- is_true: ccs.clusters.cluster_one.cluster_uuid
109+
- match: { ccs.clusters.cluster_one.mode: sniff }
110+
- match: { ccs.clusters.cluster_one.skip_unavailable: true }
111+
- match: { ccs.clusters.cluster_two.skip_unavailable: false }
112+
- is_true: ccs.clusters.cluster_one.version
113+
- match: { ccs.clusters.cluster_one.status: green }
114+
- match: { ccs.clusters.cluster_two.status: green }
115+
- is_true: ccs.clusters.cluster_one.nodes_count
116+
- is_true: ccs.clusters.cluster_one.shards_count
117+
- is_true: ccs.clusters.cluster_one.indices_count
118+
- is_true: ccs.clusters.cluster_one.indices_total_size_in_bytes
119+
- is_true: ccs.clusters.cluster_one.max_heap_in_bytes
120+
- is_true: ccs.clusters.cluster_one.mem_total_in_bytes
121+
- is_true: ccs._search.total
122+
- is_true: ccs._search.success
123+
- exists: ccs._search.skipped
124+
- is_true: ccs._search.took
125+
- is_true: ccs._search.took.max
126+
- is_true: ccs._search.took.avg
127+
- is_true: ccs._search.took.p90
128+
- is_true: ccs._search.took_mrt_true
129+
- exists: ccs._search.took_mrt_true.max
130+
- exists: ccs._search.took_mrt_true.avg
131+
- exists: ccs._search.took_mrt_true.p90
132+
- is_true: ccs._search.took_mrt_false
133+
- exists: ccs._search.took_mrt_false.max
134+
- exists: ccs._search.took_mrt_false.avg
135+
- exists: ccs._search.took_mrt_false.p90
136+
- match: { ccs._search.remotes_per_search_max: 2 }
137+
- match: { ccs._search.remotes_per_search_avg: 2.0 }
138+
- exists: ccs._search.failure_reasons
139+
- exists: ccs._search.features
140+
- exists: ccs._search.clients
141+
- is_true: ccs._search.clusters
142+
- is_true: ccs._search.clusters.cluster_one
143+
- is_true: ccs._search.clusters.cluster_two
144+
- gte: {ccs._search.clusters.cluster_one.total: 1}
145+
- gte: {ccs._search.clusters.cluster_two.total: 1}
146+
- exists: ccs._search.clusters.cluster_one.skipped
147+
- exists: ccs._search.clusters.cluster_two.skipped
148+
- is_true: ccs._search.clusters.cluster_one.took
149+
- is_true: ccs._search.clusters.cluster_one.took.max
150+
- is_true: ccs._search.clusters.cluster_one.took.avg
151+
- is_true: ccs._search.clusters.cluster_one.took.p90

0 commit comments

Comments
 (0)