"java.net.SocketException: Socket closed" when in a cluster mode + Docker + acquireHostList enabled #384

wajda · 2021-04-20T21:16:21Z

The issue was first discovered here AbsaOSS/spline#869

The error occurs in the combination of circumstances: Cluster mode + Docker + acqureHostList=true

My understanding of what is happening is the following.
When the VST connection is established the respective HostHandler asks VstCommunication class to refresh the host list from the server. When the new hosts are added to the set, the old ones (unless are pointing to exactly the same ip:port) are immediately discarded along with all associated connection pools and sockets.
The problem is that the connection instance, that has just been created and triggered the host list refreshing process in the first place, the one that is being returned from the VstCommunication.connect() method holds a pointer to the host that might have just been discarded (and the associated socket closed) during this host list refreshing routine. As a result in this circumstances the VstCommunication.connect() method returns a connection that is dead on the moment of creation, with all the consequences.

This is exactly what happens when ArangoDB runs in a virtualized environment (Docker in our case) when the networking is organized in a way that the client process addresses the server via a different IP (or a host name) that the server sees from inside its network.

The issue is reproducible by spinning up a DB cluster via arangodb-starter in a Docker, and run ArangoDBTest.execute_acquireHostList_enabled() test method against it.

The text was updated successfully, but these errors were encountered:

wajda · 2021-04-20T21:18:02Z

The solution would be to simply check if the connection instance is still alive before returning it from the VstCommunication.connect() method. If not, simply keep re-getting the connection from the host handler until a usable one is received.

rashtao · 2021-04-26T08:34:13Z

Hi @wajda ,
how do you exactly start the cluster? And how do you exactly access it?

wajda · 2021-04-26T08:56:43Z

This happens on multiple environments. First we run into this issue on Kubernetes on AWS, then my colleague reproduced it locally using Docker 20, while it worked for me on Fedora's docker 19 (moby-engine). AfetreAfter installing Docker-ce 20 the issue occurred to me as well.

On my localhost I use the following setup to reproduce it:

Linux (Fedora 33),
Docker-ce 20 (everything is by default, no customization at all)
I start arangodb cluster via the arangodb-starter using the following command:

docker run -it --rm \
  --name=adb
  -p 8528:8528 \
  -v /var/run/docker.sock:/var/run/docker.sock arangodb/arangodb-starter \
  --starter.local \
  --starter.address=172.17.0.1 \
  --docker.container=adb

(not sure if I used -v or not on my last tests, I tried different combinations.... but it doesn't affect the way the error occurs, it's consistently reproducible either way)

For the driver config, enable acquireHostList. Otherwise nothing special.

Then I access it via VST on localhost

wajda · 2021-04-26T08:58:17Z

here is our driver config - https://github.com/AbsaOSS/spline/blob/develop/persistence/src/main/scala/za/co/absa/spline/persistence/ArangoDatabaseFacade.scala#L40

rashtao · 2021-04-28T12:45:29Z

Hi @wajda ,
I think the error is caused by the fact that you connect the driver to localhost, but since you have acqureHostList=true, in the returned host list the same host would have a different name (eg. tcp://172.17.0.1:8529).

Can you please try connecting the driver to 172.17.0.1:8529 instead of localhost?

wajda · 2021-04-28T15:48:28Z

On a local Docker yes, that would work (as I mentioned in AbsaOSS/spline#869 (comment)). The problem is that it's not always possible in a real prod environments with a more complicated networking. where for instance, IPs are auto generated and aren't stable enough to be put in a config file for example. So the first client connection needs to be done on alias, for example.

dvagapov · 2021-04-28T16:10:26Z

I have ArangoDB cluster on Kubernetes.

Connection string to arangodb: arangodb-cluster.arango-namespace.svc.cluster.local:8529

Kubernetes doesn't have stable IP and operates via dns-names:

Arango pods names:
arangodb-cluster-agnt-260nvfct-87c535 20.0.63.123
arangodb-cluster-agnt-glwzb7hd-87c535 20.0.63.81
arangodb-cluster-agnt-optv4hel-87c535 20.0.62.19

arangodb-cluster-crdn-c4eas9vw-044b59 20.0.62.185
arangodb-cluster-crdn-hrf4bdbi-044b59 20.0.63.232
arangodb-cluster-crdn-xraip1sl-044b59 20.0.63.99

arangodb-cluster-prmr-9njqj9u2-044b59 20.0.63.161
arangodb-cluster-prmr-aotybl68-044b59 20.0.62.167
arangodb-cluster-prmr-wdmsmksh-044b59 20.0.63.147

If I delete pod "arangodb-cluster-prmr-9njqj9u2-044b59 20.0.63.161" - kubermnetes will create new pod with new IP address

rashtao · 2021-04-29T12:43:37Z

Thanks for clarifying, it makes sense to me now!

wajda mentioned this issue Apr 20, 2021

arangodb-java-driver-384 "java.net.SocketException: Socket closed" #385

Merged

rashtao self-assigned this Apr 21, 2021

rashtao closed this as completed in #385 Apr 29, 2021

rashtao added a commit that referenced this issue Dec 19, 2023

fixed management of hosts marked for deletion (DE-723, #384)

98d6b78

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"java.net.SocketException: Socket closed" when in a cluster mode + Docker + acquireHostList enabled #384

"java.net.SocketException: Socket closed" when in a cluster mode + Docker + acquireHostList enabled #384

wajda commented Apr 20, 2021 •

edited

Loading

wajda commented Apr 20, 2021 •

edited

Loading

rashtao commented Apr 26, 2021

wajda commented Apr 26, 2021 •

edited

Loading

wajda commented Apr 26, 2021

rashtao commented Apr 28, 2021

wajda commented Apr 28, 2021 •

edited

Loading

dvagapov commented Apr 28, 2021

rashtao commented Apr 29, 2021

"java.net.SocketException: Socket closed" when in a cluster mode + Docker + acquireHostList enabled #384

"java.net.SocketException: Socket closed" when in a cluster mode + Docker + acquireHostList enabled #384

Comments

wajda commented Apr 20, 2021 • edited Loading

wajda commented Apr 20, 2021 • edited Loading

rashtao commented Apr 26, 2021

wajda commented Apr 26, 2021 • edited Loading

wajda commented Apr 26, 2021

rashtao commented Apr 28, 2021

wajda commented Apr 28, 2021 • edited Loading

dvagapov commented Apr 28, 2021

rashtao commented Apr 29, 2021

wajda commented Apr 20, 2021 •

edited

Loading

wajda commented Apr 20, 2021 •

edited

Loading

wajda commented Apr 26, 2021 •

edited

Loading

wajda commented Apr 28, 2021 •

edited

Loading