Unexpected command output nsenter: cannot open : No such file or directory #15802

djdevin · 2017-08-16T18:13:11Z

I am getting this error in the node log, but also "Error syncing pod" and "Pod sandbox changed, it will be killed and re-created." in the "Events" for this build pod.

W0816 17:28:05.379591    7651 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "php-15-build_coretest": Unexpected command output nsenter: cannot open : No such file or directory

This happens on a build pod being launched on that node. Pods launched on the master don't display this issue.

Version

oc v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://xyz:8443
openshift v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7

Additional Information

This was an upgrade from containerized v1.5.1 -> v3.6.0 (no RCs in between) using the upgrade playbook in the release-3.6 branch of openshift-ansible which was successfully.

This appeared to be fixed in #15210 but doesn't help, and I've verified that the most recent node image is running and /opt is populated in the node container.

find
.
./cni
./cni/bin
./cni/bin/host-local
./cni/bin/loopback
./cni/bin/openshift-sdn

I have destroyed the node in question and reinstalled with the ansible scale playbook and that did not help.

oc adm diagnostics only returned this error

ERROR: [DNet2006 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:136]
       Creating network diagnostic pod "network-diag-pod-bm41g" on node "ip-172-31-10-48.ec2.internal" with command "openshift infra network-diagnostic-pod -l 1" failed: pods "network-diag-pod-bm41g" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used provider restricted: .spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used provider restricted: .spec.securityContext.hostIPC: Invalid value: true: Host IPC is not allowed to be used provider restricted: .spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed provider restricted: .spec.containers[0].securityContext.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used provider restricted: .spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used provider restricted: .spec.containers[0].securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used provider restricted: .spec.containers[0].securityContext.hostIPC: Invalid value: true: Host IPC is not allowed to be used]

So it looks like pod connectivity is broken, but for the life of me I can't figure out why.

I could simply reinstall, as this is a test environment, but this is going to go to production eventually so I'd like to see if there's a way to fix this if we encounter it there.

The text was updated successfully, but these errors were encountered:

makentenza · 2017-08-16T23:09:00Z

@djdevin, this will probably not be your case, but just in case, I got the same behaviour while upgrading from containerized 3.5 to 3.6. I had a typo in the resources limits and requests for my builds and every build failed with 'docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "php-15-build_coretest": Unexpected command output nsenter: cannot open : No such file or directory' error.

This is the errata I had:

          limits:
            cpu: 1000m
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 256m

The units for memory were configured as 'm' instead 'Mi'. This typo was ignored in 3.5, but not in 3.6.

As I said, this will probably not be your case, so feel free to remove the comment is this not your case.

djdevin · 2017-08-17T17:27:42Z

Wow, that is exactly the case. Thanks!

I'll leave this open though because the documentation is out of date for both OCP and Origin in some places:

https://docs.openshift.com/container-platform/3.6/install_config/build_defaults_overrides.html#ansible-setting-global-build-defaults

https://docs.openshift.org/latest/install_config/build_defaults_overrides.html

I noticed it had been updated elsewhere, but we use the Ansible playbooks for deployments, so I copied the examples from the documentation.

makentenza · 2017-08-18T09:48:19Z

Yes, just reviewed and both repositories have this errata. Will change that and make a PR. Will update this issue as well with the info so you could close it once is solved.

makentenza · 2017-08-18T11:14:47Z

The following PR have been created:

openshift/openshift-ansible#5135
openshift/openshift-docs#5043

caruccio · 2017-09-05T09:49:09Z

I'm facing the exactly same error, but once in a while.

Set 04 21:39:40 ip-10-0-2-188.ec2.internal origin-node[11690]: W0904 21:39:40.178897   11690 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "apiqueue-8-deploy_guiaon": Unexpected command output nsenter: cannot open : No such file or directory
Set 04 21:39:40 ip-10-0-2-188.ec2.internal origin-node[11690]: with error: exit status 1

resources:
      limits:
        cpu: 366m
        memory: 512Mi
      requests:
        cpu: 10m
        memory: 128Mi

This is a fresh 3.6.0 install from ansible.

caruccio · 2017-09-05T10:00:44Z

A little bit mor context:

[centos@m1 ~]$ oc version
oc v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://api-internal.infra.getupcloud.com:443
openshift v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7

Set 04 20:58:48 ip-10-0-2-188.ec2.internal origin-node[11690]: ValueFrom:nil} {Name:OPENSHIFT_DEPLOYMENT_NAME Value:apiqueue-8 ValueFrom:nil} {Name:OPENSHIFT_DEPLOYMENT_NAMESPACE Value:guiaon ValueFrom:nil}] Resources:{Limits:map[cpu:{i:{value:366 scale:-3} d:{Dec:<nil>} s:366m Format:DecimalSI} memory:{i:{value:268435456 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI} memory:{i:{value:26214400 scale:0} d:{Dec:<nil>} s:25Mi Format:BinarySI}]} VolumeMounts:[{Name:deployer-token-r9cmk ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[],Drop:[KILL MKNOD SETGID SETUID SYS_CHROOT],},Privileged:*false,SELinuxOptions:&SELinuxOptions{User:,Role:,Type:,Level:s0:c14,c9,},RunAsUser:*1000200000,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.

Set 04 20:59:25 ip-10-0-2-188.ec2.internal dockerd-current[13102]: --> Scaling apiqueue-7 down to zero

Set 04 21:00:29 ip-10-0-2-188.ec2.internal origin-node[11690]: E0904 21:00:29.590877   11690 pod_workers.go:182] Error syncing pod 064ef43d-8f51-11e7-b766-028441715ba0 ("apiqueue-7-kr2z0_guiaon(064ef43d-8f51-11e7-b766-028441715ba0)"), skipping: error killing pod: failed to "KillContainer" for "php" with KillContainerError: "rpc error: code = 4 desc = context deadline exceeded"

Set 04 21:00:29 ip-10-0-2-188.ec2.internal origin-node[11690]: W0904 21:00:29.890910   11690 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "apiqueue-7-kr2z0_guiaon": Unexpected command output nsenter: cannot open : No such file or directory

Set 04 21:00:30 ip-10-0-2-188.ec2.internal origin-node[11690]: W0904 21:00:30.286638   11690 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "apiqueue-7-kr2z0_guiaon": Unexpected command output nsenter: cannot open : No such file or directory

Set 04 21:06:12 ip-10-0-2-188.ec2.internal origin-node[11690]: W0904 21:06:12.803988   11690 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "apiqueue-7-kr2z0_guiaon": Unexpected command output nsenter: cannot open : No such file or directory

Set 04 21:09:25 ip-10-0-2-188.ec2.internal dockerd-current[13102]: --> Scaling apiqueue-8 to 1 before performing acceptance check
Set 04 21:09:25 ip-10-0-2-188.ec2.internal dockerd-current[13102]: --> Waiting up to 10m0s for pods in rc apiqueue-8 to become ready

Set 04 21:19:25 ip-10-0-2-188.ec2.internal dockerd-current[13102]: error: update acceptor rejected apiqueue-8: pods for rc "apiqueue-8" took longer than 600 seconds to become ready

Set 04 21:19:29 ip-10-0-2-188.ec2.internal origin-node[11690]: W0904 21:19:29.230694   11690 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "apiqueue-8-deploy_guiaon": Unexpected command output nsenter: cannot open : No such file or directory

Restarting docker resolved the issue.

caruccio · 2017-09-05T10:28:25Z

In fact it's happening on all node hosts. There are pods Terminating as of 13d.

makentenza · 2017-09-06T13:40:00Z

@caruccio This issue is just related with Builds, as there was a typo in the documentation around defaults resources for Builds. As I can see you are facing this issue with deployments, so your typo would probably be configured in a different place. Check if you have configured resource limits in any other place, or just test the deployment in a new namespace with no resource limits applied and check what happens.

caruccio · 2017-09-06T23:08:49Z

@makentenza I've tried on a new project, no limits/requests as you suggested, but still getting the same result:

Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: I0906 23:02:02.112803   15384 kuberuntime_manager.go:458] Container {Name:deployment Image:openshift/origin-deployer:v3.6.0 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:KUBERNETES_MASTER Value:https://ip-10-0-3-166.ec2.internal ValueFrom:nil} {Name:OPENSHIFT_MASTER Value:https://ip-10-0-3-166.ec2.internal ValueFrom:nil} {Name:BEARER_TOKEN_FILE Value:/var/run/secrets/kubernetes.io/serviceaccount/token ValueFrom:nil} {Name:OPENSHIFT_CA_DATA Value:-----BEGIN CERTIFICATE-----
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: -----END CERTIFICATE-----
Set 06 23:02:02 ip-10-0-1-188.ec2.internal origin-node[15384]: ValueFrom:nil} {Name:OPENSHIFT_DEPLOYMENT_NAME Value:testlimit-1 ValueFrom:nil} {Name:OPENSHIFT_DEPLOYMENT_NAMESPACE Value:testlimit ValueFrom:nil}] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:deployer-token-68jz7 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[],Drop:[KILL MKNOD SETGID SETUID SYS_CHROOT],},Privileged:*false,SELinuxOptions:&SELinuxOptions{User:,Role:,Type:,Level:s0:c45,c35,},RunAsUser:*1002050000,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
...
Set 06 23:02:22 ip-10-0-1-188.ec2.internal origin-node[15384]: W0906 23:02:22.449586   15384 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "testlimit-1-build_testlimit": Unexpected command output nsenter: cannot open : No such file or directory

Note there is no limits on the pod definition Resources:{Limits:map[] Requests:map[]}.
Do you believe it's another possible bug?

caruccio · 2017-09-07T00:41:38Z

I believe I've figured it out. That nsenter error message appears after container pod is destroyed, thus nsenter can't find process network from file /proc//ns/net in order to inform pod status do controller (I guess).

smarterclayton · 2017-09-12T02:51:58Z

I think it's likely that we shouldn't be reporting this error. @sjenning something we should open upstream.

openshift-bot · 2018-02-19T23:05:39Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2018-03-21T23:12:50Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2018-04-20T23:22:46Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

pweil- added priority/P1 component/networking kind/bug Categorizes issue or PR as related to a bug. labels Aug 16, 2017

pweil- assigned knobunc Aug 16, 2017

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 19, 2018

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 21, 2018

openshift-ci-robot closed this as completed Apr 20, 2018

prein mentioned this issue Aug 7, 2018

Pods stuck on terminating kubernetes/kubernetes#51835

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected command output nsenter: cannot open : No such file or directory #15802

Unexpected command output nsenter: cannot open : No such file or directory #15802

djdevin commented Aug 16, 2017

makentenza commented Aug 16, 2017

djdevin commented Aug 17, 2017

makentenza commented Aug 18, 2017

makentenza commented Aug 18, 2017

caruccio commented Sep 5, 2017

caruccio commented Sep 5, 2017 •

edited

Loading

caruccio commented Sep 5, 2017

makentenza commented Sep 6, 2017

caruccio commented Sep 6, 2017

caruccio commented Sep 7, 2017

smarterclayton commented Sep 12, 2017

openshift-bot commented Feb 19, 2018

openshift-bot commented Mar 21, 2018

openshift-bot commented Apr 20, 2018

Unexpected command output nsenter: cannot open : No such file or directory #15802

Unexpected command output nsenter: cannot open : No such file or directory #15802

Comments

djdevin commented Aug 16, 2017

Version

Additional Information

makentenza commented Aug 16, 2017

djdevin commented Aug 17, 2017

makentenza commented Aug 18, 2017

makentenza commented Aug 18, 2017

caruccio commented Sep 5, 2017

caruccio commented Sep 5, 2017 • edited Loading

caruccio commented Sep 5, 2017

makentenza commented Sep 6, 2017

caruccio commented Sep 6, 2017

caruccio commented Sep 7, 2017

smarterclayton commented Sep 12, 2017

openshift-bot commented Feb 19, 2018

openshift-bot commented Mar 21, 2018

openshift-bot commented Apr 20, 2018

caruccio commented Sep 5, 2017 •

edited

Loading