[ingress/controllers/nginx] Nginx shutdown doesn't gracefully close "keep-alive" connections #1123

micheleorsi · 2016-06-02T14:55:20Z

We are analyzing the behaviour of re-deploying nginx ingress controller with a lot of requests flooding. Basically we have gatling or ab (command line tool) that performs a lot of parallel requests to our kubernetes cluster for a while.

With default nginx configuration we discovered that:

if clients don't request keep-alive connection the process is really smooth (0 errors)
if clients request keep-alive connection we have a lot fails (java.io.IOException: Remotely closed)

We tried several things and the latest one was to gracefully shutdown nginx in the preStop hook with this command:
/usr/sbin/nginx -s quit

The expected behaviour would be that nginx maintains keep-alive connections before receiving the SIGTERM. Then, once it receives the -s quit, it starts to block new keep-alive connections (with "Connection: keep-alive, close" header) and notifies the client that it should close the kept alive connections.
On the other hand the observed behaviour is that nginx continues to use alive connection until it dies and client receives "java.io.IOException: Remotely closed".

Finally we also tried to modify the parameter "keepalive_timeout" in nginx configration to "0". In this way nginx never accepts keep-alive connections (with "Connection: keep-alive, close" header) and we have a smooth results of 0 errors.

Obviously it is not the best configuration because we don't optimise number of connections used and we have a strong feeling that we are missing something ..

The text was updated successfully, but these errors were encountered:

aledbf · 2016-06-02T23:22:40Z

@micheleorsi can you test the signal WINCH? (http://nginx.org/en/docs/control.html)
Before that please check the keepalive_timeout is < 30. This because the ingress controller waits 30 seconds before the termination of the process

micheleorsi · 2016-06-03T13:47:07Z

Just tried, same behaviour!

Here is our preStop script. (we have an endpoint configured with nginx that just serve the readiness.html file, watched by F5 in order to exclude the physical node as soon as the preStop hook has been called)

rm /readiness.html
sleep 50
/usr/sbin/nginx -s WINCH

.. and this is our kuberentes deployment configuration, for high-availability

[..]

spec:
  replicas: 1
  minReadySeconds: 15
  strategy:
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 100
    spec:
      terminationGracePeriodSeconds: 100
      containers:

[..]

This is the output of our ab testing

$ ab -r -k -c 100 -n 1000000 -v 1 'http://<our-test-endpoint>/load?meanTime=2000&variation=0&timeout=60000'

[..]

Concurrency Level:      100
Time taken for tests:   215.718 seconds
Complete requests:      10594
Failed requests:        100
  (Connect: 0, Receive: 0, Length: 100, Exceptions: 0)
Keep-Alive requests:    10494
Total transferred:      15520626 bytes
HTML transferred:       10871784 bytes
Requests per second:    49.11 [#/sec] (mean)
Time per request:       2036.230 [ms] (mean)
Time per request:       20.362 [ms] (mean, across all concurrent requests)
Transfer rate:          70.26 [Kbytes/sec] received

[..]

Obviously if you look at the percentage this doesn't seem a big problem: "just" 100 out of 1000000. (And the behaviour could be even better if we have more than 1 replica for the nginx ingress controller pod).

The biggest point in my opinion is that when nginx dies, it drops all the connections in keep-alive status in that specific moment. So if you look at the number in that specific amount of time, 100% of requests fail.

If you have a look at the logs this is the behaviour observed:

the "go wrapper" performs all the operations in the controller.go file, method Stop()
requests continue to flow (even if the IP has been removed from the specific ingress and F5 marked that specific node as "not available")
when the "real" nginx command finally dies it drops all the connections that are still there

I am writing here just because I saw that nginx-slim has been modified quite a lot and I observed this issue on kubernetes with nginx ingress controller, but it could be a nginx specific problem.

Next step on my side is to isolate the nginx behaviour outside kubernetes and the "go wrapper".
I will keep you posted.

Thanks!

aledbf · 2016-07-02T01:09:20Z

@micheleorsi any update on this?

micheleorsi · 2016-07-05T09:49:48Z

sorry @aledbf .. not yet!
We focused on production and I didn't have time to investigate this problem. For the moment we always close the TCP connections.

I hope to have some news in the next couple of weeks. I'll keep you posted!

micheleorsi · 2016-07-12T13:34:22Z

Just finished some tests and actually nginx itself doesn't gracefully shutdown requests in keep-alive status.

Here is my test:

a simple service that serves traffic at http://localhost:8081/load
nginx (1.10.1) runs in this way:

nginx -c sample.conf

sample.conf is this:

events {}
http {
    server {
    listen 9999;

    location / {
      proxy_pass http://localhost:8081;
    }
  }
}

ab testing running:

ab -v 2 -n 1000 -c 25 -k http://127.0.0.1:9999/load

.. then I noticed then when I launch ..

nginx -s quit

.. I continue to receive

Connection: keep-alive

.. until the very end, before nginx quit.

That's (as explained before) a big problem since clients that are re-using those connections are not notified by the "Connection: close" HTTP header and continue to re-use the TCP connection ..
Then, once nginx really quits, those connection are not anymore valid and clients got errors.

Now what do you think we can do @aledbf ? Probably we should close this issue and notify nginx developers?
I think these are the right channels to submit this problem, right?

aledbf · 2016-07-12T13:47:46Z

Probably we should close this issue and notify nginx developers?

That's a good idea. Can you open a ticket in nginx with the content of the your last comment?
(to keep it simple please do not mention kubernetes)
This is the nginx issue tracker https://trac.nginx.org/nginx/

micheleorsi · 2016-07-12T15:04:24Z

Thanks for your suggestions @aledbf !
I just opened a ticket in nginx trac.

Let me link here (https://trac.nginx.org/nginx/ticket/1022) for future reference!

eparis added the area/ingress label Jun 8, 2016

micheleorsi closed this as completed Jul 12, 2016

jsravn mentioned this issue Dec 16, 2016

Consider upstreaming sky-uk/feed#111

Open

aledbf mentioned this issue Mar 23, 2017

[nginx] Lost client requests when updating a deployment and using keep alive kubernetes/ingress-nginx#489

Closed

ghost mentioned this issue Mar 5, 2018

Turning off keepalive does not work as documented kubernetes/ingress-nginx#2168

Closed

stealthybox mentioned this issue Sep 9, 2018

Support for configurable Keep-Alive for both frontend and backend traefik/traefik#727

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ingress/controllers/nginx] Nginx shutdown doesn't gracefully close "keep-alive" connections #1123

[ingress/controllers/nginx] Nginx shutdown doesn't gracefully close "keep-alive" connections #1123

micheleorsi commented Jun 2, 2016

aledbf commented Jun 2, 2016

Uh oh!

micheleorsi commented Jun 3, 2016

Uh oh!

aledbf commented Jul 2, 2016

Uh oh!

micheleorsi commented Jul 5, 2016

Uh oh!

micheleorsi commented Jul 12, 2016

Uh oh!

aledbf commented Jul 12, 2016

Uh oh!

micheleorsi commented Jul 12, 2016

Uh oh!

[ingress/controllers/nginx] Nginx shutdown doesn't gracefully close "keep-alive" connections #1123

[ingress/controllers/nginx] Nginx shutdown doesn't gracefully close "keep-alive" connections #1123

Comments

micheleorsi commented Jun 2, 2016

aledbf commented Jun 2, 2016

Uh oh!

micheleorsi commented Jun 3, 2016

Uh oh!

aledbf commented Jul 2, 2016

Uh oh!

micheleorsi commented Jul 5, 2016

Uh oh!

micheleorsi commented Jul 12, 2016

Uh oh!

aledbf commented Jul 12, 2016

Uh oh!

micheleorsi commented Jul 12, 2016

Uh oh!