Skip to content

[CI] Builds failing due to Gradle test executor crash #52610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mark-vieira opened this issue Feb 21, 2020 · 45 comments
Closed

[CI] Builds failing due to Gradle test executor crash #52610

mark-vieira opened this issue Feb 21, 2020 · 45 comments
Assignees
Labels
:Delivery/Build Build or test infrastructure low-risk An open issue or test failure that is a low risk to future releases Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI

Comments

@mark-vieira
Copy link
Contributor

mark-vieira commented Feb 21, 2020

We are seeing occasional instances of errors that look like this:

Execution failed for task ':client:rest-high-level:test'.
> Process 'Gradle Test Executor 260' finished with non-zero exit value 1
This problem might be caused by incorrect test process configuration.
Please refer to the test execution section in the User Manual at https://docs.gradle.org/6.2/userguide/java_testing.html#sec:test_execution

Looking at the console and daemon logs we see these exceptions:

org.gradle.internal.remote.internal.MessageIOException: Could not read message from '/127.0.0.1:41042'.
	at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:94)
	at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:268)
	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
	at java.base/java.lang.Thread.run(Thread.java:830)
Caused by: java.lang.IllegalArgumentException
	at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:72)
	at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:52)
	at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:81)
	... 7 more
org.gradle.internal.remote.internal.ConnectException: Could not connect to server [22a1b02e-8ab0-42a3-ae0d-07afbc5a0358 port:37833, addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1].
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:127)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:69)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:68)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:73)
Caused by: java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:776)
	at java.base/sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:120)
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)
	... 5 more

The strange thing about these messages is that none of the ports mentioned here are either the daemon server nor the Gradle client. So who's talking to who here and where are they getting these ports from? My other though was that maybe one of these is an ES test cluster node but none of the testclusters are using these ports. Perhaps interner test clusters?

I'm going to reach out to the folks at Gradle to try and track this down.

@mark-vieira mark-vieira added :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI labels Feb 21, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Build)

@droberts195
Copy link
Contributor

This is still happening intermittently. The two cases today so far are:

@DaveCTurner
Copy link
Contributor

Another one recently:

@jkakavas
Copy link
Member

Another one on 7.x intake https://gradle-enterprise.elastic.co/s/xaekqb5ejkpmc

@tvernum
Copy link
Contributor

tvernum commented Jun 12, 2020

Another (7.x)
https://gradle-enterprise.elastic.co/s/hkbqbkj7tl37g/console-log?task=:plugins:analysis-icu:test

org.gradle.internal.remote.internal.ConnectException: Could not connect to server [80d32bcf-ecbf-415a-af8a-ba2e1f7bb7eb port:37669, addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1].	
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)	
	at org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)	
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:122)	
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:70)	
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:68)	
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:73)	
Caused by: java.net.ConnectException: Connection refused	
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)	
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)	
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)	
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)	
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)

@romseygeek
Copy link
Contributor

@jaymode
Copy link
Member

jaymode commented Jul 22, 2020

Another occurrence on 7.x: https://gradle-enterprise.elastic.co/s/sjj3dzchr5nau

Unexpected exception thrown.
org.gradle.internal.remote.internal.MessageIOException: Could not read message from '/127.0.0.1:33632'.
        at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:94)
        at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270)
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
        at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
        at java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.IllegalArgumentException: 
        at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:72)
        at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:52)
        at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:81)
        at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270)
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
        at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
        at java.lang.Thread.run(Thread.java:832)
org.gradle.internal.remote.internal.ConnectException: Could not connect to server [1c684671-635a-4952-b141-a834070f48cf port:38057, addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1].
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:123)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)
	... 5 more

@rjernst
Copy link
Member

rjernst commented Aug 6, 2020

@mark-vieira Any suggestion on how we can track down these failures? They appear to still be happening. Perhaps @breskeby could take a look?

@breskeby
Copy link
Contributor

breskeby commented Aug 6, 2020

Do we have any way to reproduce one of those mentioned cases? I can have a closer look to figure out what's the root cause but it's hard with no way of reproducing and it seems they occurre quite rarely (but enough to be distrusting. If you see those, please link a gradle enterprise link to the according Build.

@rjernst
Copy link
Member

rjernst commented Aug 6, 2020

@breskeby I believe this only happens in CI. I have never seen it locally. Here is one of those failures from today:
https://gradle-enterprise.elastic.co/s/wr2dr3w2hqcos

@mark-vieira
Copy link
Contributor Author

My current guess is that we have internal cluster tests spinning nodes up on ports conflicting with either the daemon or test executor. I tweaked the way we capture the build logs for the GCP upload to include the full junit xml reports (which include stdout). The intention was to see if I can find the daemon/worker port anywhere in those logs. I haven't had a chance to actually go back and investigate after adding the logging though.

@breskeby
Copy link
Contributor

breskeby commented Aug 13, 2020

I doubt this issue is related to conflicts of ports. My best guess at the moment is that something in the tests crashes the test executor jvm (e.g. calling System.exit(1) explicitly). Another potential issue might be tests conflicting or using a custom SecurityManager. Looking at some test fixtures I see some explicit System.exit(1) calls in there. I'll dig a bit deeper into this direction.

@mark-vieira
Copy link
Contributor Author

When I initially looked at the daemon logs there were messages about failing to communicate to the test worker. We had an identical issue with a regression in a recent Gradle release regarding local interface binding.

@breskeby
Copy link
Contributor

breskeby commented Aug 13, 2020 via email

@tvernum
Copy link
Contributor

tvernum commented Aug 27, 2020

@droberts195
Copy link
Contributor

Another case in https://gradle-enterprise.elastic.co/s/koolcoru744pi

Unexpected exception thrown.
org.gradle.internal.remote.internal.MessageIOException: Could not write '/127.0.0.1:50596'.
	at org.gradle.internal.remote.internal.inet.SocketConnection.flush(SocketConnection.java:140)
	at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionDispatch.run(MessageHub.java:337)
	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
	at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.io.IOException: Connection reset by peer
	at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62)
	at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:58)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:50)
	at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:484)
	at org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.writeWithNonBlockingRetry(SocketConnection.java:279)
	at org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.writeBufferToChannel(SocketConnection.java:267)
	at org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.flush(SocketConnection.java:261)
	at org.gradle.internal.remote.internal.inet.SocketConnection.flush(SocketConnection.java:138)
	... 7 more

@droberts195
Copy link
Contributor

Another case in https://gradle-enterprise.elastic.co/s/uo3zx5kuf6yak

* What went wrong:
Execution failed for task ':x-pack:plugin:ilm:internalClusterTest'.
> Process 'Gradle Test Executor 803' finished with non-zero exit value 1
  This problem might be caused by incorrect test process configuration.
  Please refer to the test execution section in the User Manual at https://docs.gradle.org/6.6.1/userguide/java_testing.html#sec:test_execution

@markharwood
Copy link
Contributor

Another case in https://gradle-enterprise.elastic.co/s/ez5axqidlgvxk
7.9 CI LINUX elastic+elasticsearch+7.9+multijob-unix-compatibility os=debian-9&&immutable

org.gradle.internal.remote.internal.ConnectException: Could not connect to server [06833148-67c0-4c86-8dd4-326b5691c2ca port:42837, addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1].	
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)	
	at org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)	
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:123)	
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71)	
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)	
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)	
Caused by: java.net.ConnectException: Connection refused	
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)	
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)	
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)	
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)	
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)	
	... 5 more

@mark-vieira mark-vieira added Team:Delivery Meta label for Delivery team and removed Team:Core/Infra Meta label for core/infra team labels Nov 11, 2020
@przemekwitek
Copy link
Contributor

Yet another case in https://gradle-enterprise.elastic.co/s/cayul3sa6ufow
FAILURE #7.10 55e79dd release-tests - 20201207090007-9D5447B0

10:03:02 org.gradle.internal.remote.internal.ConnectException: Could not connect to server [bf866115-6be4-4209-9610-b45749518015 port:37991, addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1].
10:03:02 	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)
10:03:02 	at org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)
10:03:02 	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:123)
10:03:02 	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71)
10:03:02 	at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
10:03:02 	at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
10:03:02 Caused by: java.net.ConnectException: Connection refused
10:03:02 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
10:03:02 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
10:03:02 	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
10:03:02 	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)
10:03:02 	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)
10:03:02 	... 5 more
10:03:02 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
10:03:12 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8

@nik9000
Copy link
Member

nik9000 commented Dec 23, 2020

Another one: https://gradle-enterprise.elastic.co/s/2uspiahitbuwm

Unexpected exception thrown.
org.gradle.internal.remote.internal.MessageIOException: Could not write '/127.0.0.1:51968'.
        at org.gradle.internal.remote.internal.inet.SocketConnection.flush(SocketConnection.java:140)
        at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionDispatch.run(MessageHub.java:337)
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
        at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
        at java.lang.Thread.run(Thread.java:832)
Caused by: java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
        at sun.nio.ch.IOUtil.write(IOUtil.java:58)
        at sun.nio.ch.IOUtil.write(IOUtil.java:50)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:484)
        at org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.writeWithNonBlockingRetry(SocketConnection.java:279)
        at org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.writeBufferToChannel(SocketConnection.java:267)
        at org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.flush(SocketConnection.java:261)
        at org.gradle.internal.remote.internal.inet.SocketConnection.flush(SocketConnection.java:138)
        at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionDispatch.run(MessageHub.java:337)
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
        at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
        at java.lang.Thread.run(Thread.java:832)
Unexpected exception thrown.
org.gradle.internal.remote.internal.MessageIOException: Could not read message from '/127.0.0.1:51968'.
        at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:94)
        at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270)
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
        at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
        at java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.IllegalArgumentException: 
        at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:72)
        at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:52)
        at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:81)
        at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270)
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
        at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
        at java.lang.Thread.run(Thread.java:832)
org.gradle.internal.remote.internal.ConnectException: Could not connect to server [f54c9ebc-e5f7-4052-a755-33f1087fde35 port:34431, addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1].
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:123)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)
	at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)
	... 5 more

@probakowski
Copy link
Contributor

Another one: https://gradle-enterprise.elastic.co/s/kcx3euig3ntfc

org.gradle.internal.remote.internal.MessageIOException: Could not read message from '/127.0.0.1:35914'. |  
-- | --
  | at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:94) |  
  | at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270) |  
  | at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) |  
  | at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48) |  
  | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) |  
  | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) |  
  | at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56) |  
  | at java.lang.Thread.run(Thread.java:832) |  
  | Caused by: java.lang.IllegalArgumentException: |  
  | at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:72) |  
  | at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:52) |  
  | at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:81) |  
  | at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270) |  
  | at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) |  
  | at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48) |  
  | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) |  
  | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) |  
  | at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56) |  
  | at java.lang.Thread.run(Thread.java:832) |  
  | org.gradle.internal.remote.internal.ConnectException: Could not connect to server [cd35320b-fcb9-4dfd-a4fe-65d6fd5f2c47 port:36481, addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1]. |  
  | at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67) |  
  | at org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36) |  
  | at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:123) |  
  | at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71) |  
  | at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69) |  
  | at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74) |  
  | Caused by: java.net.ConnectException: Connection refused |  
  | at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) |  
  | at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:776) |  
  | at java.base/sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:120) |  
  | at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81) |  
  | at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54) |  
  | ... 5 more


@mark-vieira
Copy link
Contributor Author

This is definitely one of the worse non-test related build issues we have. Still at a loss for ideas as to what is causing this or how best to mitigate.

@mark-vieira
Copy link
Contributor Author

@breskeby this is starting to be really problematic. We have at least a few of these every day. Any thoughts?

@breskeby
Copy link
Contributor

No idea tbh what's causing this. I can invest some time next week to look into this. its hard to diagnose as it happens so irregular

@mark-vieira
Copy link
Contributor Author

Or some other mitigation? It would be great if we could retry in this scenario but I think the issue is the test worker is dying.

@mark-vieira
Copy link
Contributor Author

I'm wondering if we could take some inspiration from the test retry plugin to reexecute when we encounter a connect exception?

https://github.com/gradle/test-retry-gradle-plugin

breskeby added a commit that referenced this issue Apr 21, 2021
Related to #52610 this PR introduces a rerun of all tests for a test task if the test jvm 
has crashed because of a system exit. We furthermore log potential tests that caused 
the System.exit based on which tests have been active at the time of the system exit.

We also modified the build scan logic to track unexpected test jvm exists 
with the tag `unexpected-test-jvm-exit`
breskeby added a commit to breskeby/elasticsearch that referenced this issue Apr 21, 2021
Related to elastic#52610 this PR introduces a rerun of all tests for a test task if the test jvm 
has crashed because of a system exit. We furthermore log potential tests that caused 
the System.exit based on which tests have been active at the time of the system exit.

We also modified the build scan logic to track unexpected test jvm exists 
with the tag `unexpected-test-jvm-exit`
@mark-vieira
Copy link
Contributor Author

This should be addressed by #71881.

breskeby added a commit that referenced this issue Apr 22, 2021
Related to #52610 this PR introduces a rerun of all tests for a test task if the test jvm 
has crashed because of a system exit. We furthermore log potential tests that caused 
the System.exit based on which tests have been active at the time of the system exit.

We also modified the build scan logic to track unexpected test jvm exists 
with the tag `unexpected-test-jvm-exit`
@cbuescher
Copy link
Member

@mark-vieira found this issue on test triage today googling the exception line, so maybe you want to take another look even though this is closed?

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob+platform-support-unix/os=debian-9&&immutable/38/console

Build: https://gradle-enterprise.elastic.co/s/3zxtbqrj2zivg

15:15:10 Unexpected exception thrown.
15:15:10 org.gradle.internal.remote.internal.MessageIOException: Could not read message from '/127.0.0.1:45320'.
15:15:10 	at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:94)
15:15:10 	at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270)
15:15:10 	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
15:15:10 	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
15:15:10 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
15:15:10 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
15:15:10 	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
15:15:10 	at java.base/java.lang.Thread.run(Thread.java:832)
15:15:10 Caused by: java.lang.IllegalArgumentException
15:15:10 	at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:72)
15:15:10 	at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:52)
15:15:10 	at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:81)
15:15:10 	... 7 more

@mark-vieira
Copy link
Contributor Author

Yeah, we had a fix but it caused other issues so we reverted. Given that, I think it makes sense to reopen this for visibility.

@mark-vieira mark-vieira reopened this May 25, 2021
@benwtrent
Copy link
Member

Happened again: https://gradle-enterprise.elastic.co/s/oerbc2dcndaf6

Unexpected exception thrown. |  
-- | --
  | org.gradle.internal.remote.internal.MessageIOException: Could not read message from '/127.0.0.1:60286'. |  
  | at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:94) |  
  | at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270) |  
  | at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) |  
  | at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48) |  
  | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) |  
  | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) |  
  | at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56) |  
  | at java.lang.Thread.run(Thread.java:831) |  
  | Caused by: java.lang.IllegalArgumentException: |  
  | at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:72) |  
  | at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:52) |  
  | at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:81) |  
  | at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270) |  
  | at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) |  
  | at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48) |  
  | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) |  
  | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) |  
  | at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56) |  
  | at java.lang.Thread.run(Thread.java:831) |  
  | org.gradle.internal.remote.internal.ConnectException: Could not connect to server [db2adc07-afb3-450f-ad78-23396fb168e0 port:35289, addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1]. |  
  | at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67) |  
  | at org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36) |  
  | at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:123) |  
  | at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71) |  
  | at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69) |  
  | at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74) |  
  | Caused by: java.net.ConnectException: Connection refused |  
  | at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) |  
  | at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715) |  
  | at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111) |  
  | at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81) |  
  | at org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54) |  
  | ... 5 more

@breskeby
Copy link
Contributor

breskeby commented Sep 7, 2021

We saw something different but similar failure today here: https://gradle-enterprise.elastic.co/s/eicalr3x7xio4/console-log?task=:x-pack:plugin:spatial:test

@cbuescher
Copy link
Member

@probakowski
Copy link
Contributor

@ywangd
Copy link
Member

ywangd commented Nov 24, 2021

These failures today on 7.15 and 7.16 are of the same nature:

@tvernum
Copy link
Contributor

tvernum commented Jan 4, 2022

@ywangd
Copy link
Member

ywangd commented Jan 10, 2022

@ywangd
Copy link
Member

ywangd commented Feb 2, 2022

@mark-vieira mark-vieira added the low-risk An open issue or test failure that is a low risk to future releases label Oct 9, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@DaveCTurner
Copy link
Contributor

Here is another failure of this nature, this time with a SEGV in the JVM:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000ffffa0996a10, pid=71459, tid=120356
#
# JRE version: OpenJDK Runtime Environment (21.0+35) (build 21+35-2513)
# Java VM: OpenJDK 64-Bit Server VM (21+35-2513, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# C  [libc.so.6+0x7aa10]  strlen+0x10
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /dev/shm/elastic+elasticsearch+main+multijob+platform-support-arm/server/build/testrun/internalClusterTest/hs_err_pid71459.log
[117.600s][warning][os] Loading hsdis library failed
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

@mark-vieira
Copy link
Contributor Author

Where did you find that error output? I couldn't find anything relevant in the GCP upload.

@DaveCTurner
Copy link
Contributor

@mark-vieira
Copy link
Contributor Author

https://gradle-enterprise.elastic.co/s/lwkqauz34576g/console-log?page=6#L5354

How did I not see that? 🤦

We can start by making sure we capture hs_err_pid71459.log file so we can see what's going on. I'll get that done.

@mark-vieira
Copy link
Contributor Author

The last batch of these were caused by an interaction between Lucene and a JVM bug. That's be resolved so I'm going to close this for now. We can open a new issue if this starts happening again for some other cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure low-risk An open issue or test failure that is a low risk to future releases Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests