Skip to content

Commit 2a49ad9

Browse files
authored
Slightly better hot threads for transport workers (#96315)
A completely idle `transport_worker` thread is reported as `0.0%` idle, which is confusing. Moreover the docs on the network threading model do not reflect the changes made in #90482. This commit fixes both of those things.
1 parent f6e9bbf commit 2a49ad9

File tree

3 files changed

+11
-11
lines changed

3 files changed

+11
-11
lines changed

docs/reference/modules/network/threading.asciidoc

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ reported like this:
6363

6464
[source,text]
6565
----
66-
100.0% [cpu=0.0%, other=100.0%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000004][transport_worker][T#1]'
66+
0.0% [cpu=0.0%, idle=100.0%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000004][transport_worker][T#1]'
6767
10/10 snapshots sharing following 9 elements
6868
[email protected]/sun.nio.ch.EPoll.wait(Native Method)
6969
[email protected]/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:118)
@@ -77,11 +77,9 @@ reported like this:
7777
----
7878

7979
Note that `transport_worker` threads should always be in state `RUNNABLE`, even
80-
when waiting for input, because they block in the native `EPoll#wait` method.
81-
This means the hot threads API will report these threads at 100% overall
82-
utilisation. This is normal, and the breakdown of time into `cpu=` and `other=`
83-
fractions shows how much time the thread spent running and waiting for input
84-
respectively.
80+
when waiting for input, because they block in the native `EPoll#wait` method. The `idle=`
81+
time reports the proportion of time the thread spent waiting for input, whereas the `cpu=` time
82+
reports the proportion of time the thread spent processing input it has received.
8583

8684
If a `transport_worker` thread is not frequently idle, it may build up a
8785
backlog of work. This can cause delays in processing messages on the channels

server/src/main/java/org/elasticsearch/monitor/jvm/HotThreads.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,9 @@ String innerDetect(ThreadMXBean threadBean, SunThreadInfo sunThreadInfo, long cu
313313
);
314314
case CPU -> {
315315
double percentCpu = getTimeSharePercentage(topThread.getCpuTime());
316-
double percentOther = getTimeSharePercentage(topThread.getOtherTime());
316+
double percentOther = Transports.isTransportThread(threadName) && topThread.getCpuTime() == 0L
317+
? 100.0
318+
: getTimeSharePercentage(topThread.getOtherTime());
317319
double percentTotal = (Transports.isTransportThread(threadName)) ? percentCpu : percentOther + percentCpu;
318320
String otherLabel = (Transports.isTransportThread(threadName)) ? "idle" : "other";
319321
sb.append(

server/src/test/java/org/elasticsearch/monitor/jvm/HotThreadsTests.java

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -892,19 +892,19 @@ public void testInnerDetectCPUModeTransportThreads() throws Exception {
892892

893893
assertThat(
894894
innerResult,
895-
containsString("0.0% [cpu=0.0%, idle=0.0%] (0s out of 10ms) cpu usage by thread '__mock_network_thread 1'")
895+
containsString("0.0% [cpu=0.0%, idle=100.0%] (0s out of 10ms) cpu usage by thread '__mock_network_thread 1'")
896896
);
897897
assertThat(
898898
innerResult,
899-
containsString("0.0% [cpu=0.0%, idle=0.0%] (0s out of 10ms) cpu usage by thread '__mock_network_thread 2'")
899+
containsString("0.0% [cpu=0.0%, idle=100.0%] (0s out of 10ms) cpu usage by thread '__mock_network_thread 2'")
900900
);
901901
assertThat(
902902
innerResult,
903-
containsString("0.0% [cpu=0.0%, idle=0.0%] (0s out of 10ms) cpu usage by thread '__mock_network_thread 3'")
903+
containsString("0.0% [cpu=0.0%, idle=100.0%] (0s out of 10ms) cpu usage by thread '__mock_network_thread 3'")
904904
);
905905
assertThat(
906906
innerResult,
907-
containsString("0.0% [cpu=0.0%, idle=0.0%] (0s out of 10ms) cpu usage by thread '__mock_network_thread 4'")
907+
containsString("0.0% [cpu=0.0%, idle=100.0%] (0s out of 10ms) cpu usage by thread '__mock_network_thread 4'")
908908
);
909909

910910
// Test with the legacy sort order

0 commit comments

Comments
 (0)