-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Netty provider blocks connections forever on SelectorUtil.select() #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Forgot to mention: I'm using AHC 2.0.0-SNAPSHOT, which uses Netty 3.6.6. |
I'm curious - do you see the same issue with the Grizzly provider (if you're willing to even test it)? |
Yes, I am indeed testing it. I'm benchmarking across all of these clients via my own fork of your benchmark code (java-http-client-benchmark project.. thanks for that! I'll be recontributing my tree as soon as I can). My benchmark suite has tests for sync, async, sync large responses, and async large responses, and benchmarks these clients: AHC w/Netty, AHC w/Grizzly, Jetty 8 client, Jetty 9 client, and HttpComponents 4. Of those, so far the Jetty client, either version, is performing about the best, with lowest GC, and lowest error count, at least in my tests and on my machines. Second would be HttpComponents 4 (though we decided we can't use it because too many features are missing), and third is AHC w/Netty. Grizzly is close to AHC w/Netty's performance, except that I get quite a few more errors and less consistent behavior. Using AHC, using either the Netty or Grizzly providers, I see GCs happening really frequently, and with Jetty client I see them happening very infrequently. |
Could you please open one or more issues with details on the problems you're seeing with the Grizzly provider? We, the Grizzly team, will do our best to address them. |
Sure. I'll do that next. |
@jbrittain Don't forget to contribute back the change you have made to the load tester, whatever result you are getting. Thanks! |
@jbrittain I know that it's been a long long time, sorry for delay. |
@jbrittain Ping. Will be closing soon otherwise. |
No feedback, closing. |
Motivation: Currently when either we or the server send Connection: close, we correctly do not return that connection to the pool. However, we rely on the server actually performing the connection closure: we never call close() ourselves. This is unnecessarily optimistic: a server may absolutely fail to close this connection. To protect our own file descriptors, we should make sure that any connection we do not return the pool is closed. Modifications: If we think a connection is closing when we release it, we now call close() on it defensively. Result: We no longer leak connections when the server fails to close them. Fixes AsyncHttpClient#324.
When making many async requests from many threads concurrently, where each request is a GET for a 512k JSON file, a small percent (~4%) of requests get blocked and stay blocked until the request gets timed out and expired on the client side. I commented out the expire() call in NettyAsyncHttpProvider so that it stops expiring them, and let them sit.. they stay blocked, even after the server has disconnected all socket connections. Here's a stack dump of where they block:
"New I/O worker #1" daemon prio=5 tid=7feff6974800 nid=0x119e32000 runnable [119e31000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:136)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <79a6e3228> (a sun.nio.ch.Util$2)
- locked <79a6e3210> (a java.util.Collections$UnmodifiableSet)
- locked <79aed0ac8> (a sun.nio.ch.KQueueSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:64)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)
This is on Linux, and happens regardless of the brand and version of JDK. ulimit -n is properly set to 16384. This is 100% reproducible on my machines with 100 threads, 40 requests per thread, 10 batches. If I decrease the number of requests per thread down to 1, it all runs to completion with nothing blocked. If I decrease the number of threads down to 50, with 40 requests per thread, it also all runs and nothing gets blocked.
The server is a separate machine running Tomcat 7 on Linux, using the NIO connector, carefully configured. ulimit for the Tomcat JVM is properly set to 16384. On the server side, all requests are being given a 200 status code and I see no errors in the logs.
The text was updated successfully, but these errors were encountered: