Skip to content

Healthcheck error - i/o timeout #4210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
richstokes opened this issue Jun 18, 2019 · 7 comments
Closed

Healthcheck error - i/o timeout #4210

richstokes opened this issue Jun 18, 2019 · 7 comments

Comments

@richstokes
Copy link

richstokes commented Jun 18, 2019

E0618 22:06:17.957750       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0618 22:06:18.933766       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0618 22:06:19.165781       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: i/o timeout
E0618 22:06:26.601754       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
I0618 22:06:28.135501       8 main.go:167] Received SIGTERM, shutting down
I0618 22:06:28.135547       8 nginx.go:358] Shutting down controller queues
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.23.0
  Build:      git-be1329b22
  Repository: https://github.com/kubernetes/ingress-nginx
-------------------------------------------------------------------------------
Image:         quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-07T09:55:27Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:30:48Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64

This is a kops provisioned cluster, running on AWS. The nginx controllers seem OK until they receive some traffic, and then they crash with the above errors. Sometimes the controllers will stay up for a period of time. It seems fairly random. Host resources are OK.

Sometimes I have also seen in the log E0618 21:57:07.096186 8 checker.go:57] healthcheck error: 500

Strange thing is this configuration has been working great for months, we've not changed anything. Is there anything else I can check on our side?

args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-http-backend
            - --default-ssl-certificate=default/le-cert
            - --configmap=$(POD_NAMESPACE)/nginx-configuration
            - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
            - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
            - --publish-service=$(POD_NAMESPACE)/ingress-nginx
            - --annotations-prefix=nginx.ingress.kubernetes.io
@Tisona
Copy link

Tisona commented Jul 3, 2019

Have you manage to fix this? I have the same issue with 0.24.1.

@w3irdrobot
Copy link
Contributor

We saw this issue recently. It appeared to be that our worker nodes were overloaded. We think kubernetes was trying to keep everything alive but ended up killing other services in the process. We added two nodes and haven't had the problem since.

@richstokes
Copy link
Author

I set a workaround by increasing the health check timeout values. This seemed to give it a chance to settle down and since then no issues.

@wmedlar
Copy link

wmedlar commented Jul 8, 2019

@richstokes what timeouts are working for you?

@richstokes
Copy link
Author

I just doubled whatever they were set to.

@joshbranham
Copy link

We tried guaranteeing a whole CPU and 1GB memory per and that changed nothing. We are also running ModSecurity, I thought it was load via that but maybe unrelated.

@aledbf
Copy link
Member

aledbf commented Sep 2, 2019

Closing. This is fixed in master #4487
If you want to test the fix, you can use the image quay.io/kubernetes-ingress-controller/nginx-ingress-controller:dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants