Skip to content

Netty provider blocks connections forever on SelectorUtil.select() #324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrittain opened this issue Jun 14, 2013 · 9 comments
Closed

Comments

@jbrittain
Copy link

When making many async requests from many threads concurrently, where each request is a GET for a 512k JSON file, a small percent (~4%) of requests get blocked and stay blocked until the request gets timed out and expired on the client side. I commented out the expire() call in NettyAsyncHttpProvider so that it stops expiring them, and let them sit.. they stay blocked, even after the server has disconnected all socket connections. Here's a stack dump of where they block:

"New I/O worker #1" daemon prio=5 tid=7feff6974800 nid=0x119e32000 runnable [119e31000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:136)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <79a6e3228> (a sun.nio.ch.Util$2)
- locked <79a6e3210> (a java.util.Collections$UnmodifiableSet)
- locked <79aed0ac8> (a sun.nio.ch.KQueueSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:64)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)

This is on Linux, and happens regardless of the brand and version of JDK. ulimit -n is properly set to 16384. This is 100% reproducible on my machines with 100 threads, 40 requests per thread, 10 batches. If I decrease the number of requests per thread down to 1, it all runs to completion with nothing blocked. If I decrease the number of threads down to 50, with 40 requests per thread, it also all runs and nothing gets blocked.

The server is a separate machine running Tomcat 7 on Linux, using the NIO connector, carefully configured. ulimit for the Tomcat JVM is properly set to 16384. On the server side, all requests are being given a 200 status code and I see no errors in the logs.

@jbrittain
Copy link
Author

Forgot to mention: I'm using AHC 2.0.0-SNAPSHOT, which uses Netty 3.6.6.

@rlubke
Copy link
Contributor

rlubke commented Jun 18, 2013

I'm curious - do you see the same issue with the Grizzly provider (if you're willing to even test it)?

@jbrittain
Copy link
Author

Yes, I am indeed testing it. I'm benchmarking across all of these clients via my own fork of your benchmark code (java-http-client-benchmark project.. thanks for that! I'll be recontributing my tree as soon as I can). My benchmark suite has tests for sync, async, sync large responses, and async large responses, and benchmarks these clients: AHC w/Netty, AHC w/Grizzly, Jetty 8 client, Jetty 9 client, and HttpComponents 4. Of those, so far the Jetty client, either version, is performing about the best, with lowest GC, and lowest error count, at least in my tests and on my machines. Second would be HttpComponents 4 (though we decided we can't use it because too many features are missing), and third is AHC w/Netty. Grizzly is close to AHC w/Netty's performance, except that I get quite a few more errors and less consistent behavior. Using AHC, using either the Netty or Grizzly providers, I see GCs happening really frequently, and with Jetty client I see them happening very infrequently.

@rlubke
Copy link
Contributor

rlubke commented Jun 18, 2013

Could you please open one or more issues with details on the problems you're seeing with the Grizzly provider? We, the Grizzly team, will do our best to address them.

@jbrittain
Copy link
Author

Sure. I'll do that next.

@jfarcand
Copy link
Contributor

@jbrittain Don't forget to contribute back the change you have made to the load tester, whatever result you are getting. Thanks!

@slandelle
Copy link
Contributor

@jbrittain I know that it's been a long long time, sorry for delay.
This really looks like a JDK bug. Which one did you use? Did you try upgrading to a recent one?

@slandelle
Copy link
Contributor

@jbrittain Ping. Will be closing soon otherwise.

@slandelle
Copy link
Contributor

No feedback, closing.

cs-workco pushed a commit to cs-workco/async-http-client that referenced this issue Apr 13, 2023
Motivation:

Currently when either we or the server send Connection: close, we
correctly do not return that connection to the pool. However, we rely on
the server actually performing the connection closure: we never call
close() ourselves. This is unnecessarily optimistic: a server may
absolutely fail to close this connection. To protect our own file
descriptors, we should make sure that any connection we do not return
the pool is closed.

Modifications:

If we think a connection is closing when we release it, we now call
close() on it defensively.

Result:

We no longer leak connections when the server fails to close them.

Fixes AsyncHttpClient#324.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants