Netty provider blocks connections forever on SelectorUtil.select() #324

jbrittain · 2013-06-14T16:53:22Z

When making many async requests from many threads concurrently, where each request is a GET for a 512k JSON file, a small percent (~4%) of requests get blocked and stay blocked until the request gets timed out and expired on the client side. I commented out the expire() call in NettyAsyncHttpProvider so that it stops expiring them, and let them sit.. they stay blocked, even after the server has disconnected all socket connections. Here's a stack dump of where they block:

"New I/O worker #1" daemon prio=5 tid=7feff6974800 nid=0x119e32000 runnable [119e31000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:136)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <79a6e3228> (a sun.nio.ch.Util$2)
- locked <79a6e3210> (a java.util.Collections$UnmodifiableSet)
- locked <79aed0ac8> (a sun.nio.ch.KQueueSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:64)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)

This is on Linux, and happens regardless of the brand and version of JDK. ulimit -n is properly set to 16384. This is 100% reproducible on my machines with 100 threads, 40 requests per thread, 10 batches. If I decrease the number of requests per thread down to 1, it all runs to completion with nothing blocked. If I decrease the number of threads down to 50, with 40 requests per thread, it also all runs and nothing gets blocked.

The server is a separate machine running Tomcat 7 on Linux, using the NIO connector, carefully configured. ulimit for the Tomcat JVM is properly set to 16384. On the server side, all requests are being given a 200 status code and I see no errors in the logs.

jbrittain · 2013-06-14T16:55:38Z

Forgot to mention: I'm using AHC 2.0.0-SNAPSHOT, which uses Netty 3.6.6.

rlubke · 2013-06-18T03:57:44Z

I'm curious - do you see the same issue with the Grizzly provider (if you're willing to even test it)?

jbrittain · 2013-06-18T04:41:19Z

Yes, I am indeed testing it. I'm benchmarking across all of these clients via my own fork of your benchmark code (java-http-client-benchmark project.. thanks for that! I'll be recontributing my tree as soon as I can). My benchmark suite has tests for sync, async, sync large responses, and async large responses, and benchmarks these clients: AHC w/Netty, AHC w/Grizzly, Jetty 8 client, Jetty 9 client, and HttpComponents 4. Of those, so far the Jetty client, either version, is performing about the best, with lowest GC, and lowest error count, at least in my tests and on my machines. Second would be HttpComponents 4 (though we decided we can't use it because too many features are missing), and third is AHC w/Netty. Grizzly is close to AHC w/Netty's performance, except that I get quite a few more errors and less consistent behavior. Using AHC, using either the Netty or Grizzly providers, I see GCs happening really frequently, and with Jetty client I see them happening very infrequently.

rlubke · 2013-06-18T04:42:39Z

Could you please open one or more issues with details on the problems you're seeing with the Grizzly provider? We, the Grizzly team, will do our best to address them.

jbrittain · 2013-06-18T05:00:29Z

Sure. I'll do that next.

jfarcand · 2014-01-22T15:34:09Z

@jbrittain Don't forget to contribute back the change you have made to the load tester, whatever result you are getting. Thanks!

slandelle · 2014-01-24T10:32:13Z

@jbrittain I know that it's been a long long time, sorry for delay.
This really looks like a JDK bug. Which one did you use? Did you try upgrading to a recent one?

slandelle · 2014-06-13T13:35:11Z

@jbrittain Ping. Will be closing soon otherwise.

slandelle · 2014-07-15T11:16:48Z

No feedback, closing.

Motivation: Currently when either we or the server send Connection: close, we correctly do not return that connection to the pool. However, we rely on the server actually performing the connection closure: we never call close() ourselves. This is unnecessarily optimistic: a server may absolutely fail to close this connection. To protect our own file descriptors, we should make sure that any connection we do not return the pool is closed. Modifications: If we think a connection is closing when we release it, we now call close() on it defensively. Result: We no longer leak connections when the server fails to close them. Fixes AsyncHttpClient#324.

slandelle added the Waiting for user label Jun 13, 2014

slandelle closed this as completed Jul 15, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Netty provider blocks connections forever on SelectorUtil.select() #324

Netty provider blocks connections forever on SelectorUtil.select() #324

jbrittain commented Jun 14, 2013

jbrittain commented Jun 14, 2013

rlubke commented Jun 18, 2013

jbrittain commented Jun 18, 2013

rlubke commented Jun 18, 2013

jbrittain commented Jun 18, 2013

jfarcand commented Jan 22, 2014

slandelle commented Jan 24, 2014

slandelle commented Jun 13, 2014

slandelle commented Jul 15, 2014

Netty provider blocks connections forever on SelectorUtil.select() #324

Netty provider blocks connections forever on SelectorUtil.select() #324

Comments

jbrittain commented Jun 14, 2013

jbrittain commented Jun 14, 2013

rlubke commented Jun 18, 2013

jbrittain commented Jun 18, 2013

rlubke commented Jun 18, 2013

jbrittain commented Jun 18, 2013

jfarcand commented Jan 22, 2014

slandelle commented Jan 24, 2014

slandelle commented Jun 13, 2014

slandelle commented Jul 15, 2014