-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected command output nsenter: cannot open : No such file or directory #15802
Comments
@djdevin, this will probably not be your case, but just in case, I got the same behaviour while upgrading from containerized 3.5 to 3.6. I had a typo in the resources limits and requests for my builds and every build failed with 'docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "php-15-build_coretest": Unexpected command output nsenter: cannot open : No such file or directory' error. This is the errata I had:
The units for memory were configured as 'm' instead 'Mi'. This typo was ignored in 3.5, but not in 3.6. As I said, this will probably not be your case, so feel free to remove the comment is this not your case. |
Wow, that is exactly the case. Thanks! I'll leave this open though because the documentation is out of date for both OCP and Origin in some places: https://docs.openshift.org/latest/install_config/build_defaults_overrides.html I noticed it had been updated elsewhere, but we use the Ansible playbooks for deployments, so I copied the examples from the documentation. |
Yes, just reviewed and both repositories have this errata. Will change that and make a PR. Will update this issue as well with the info so you could close it once is solved. |
The following PR have been created: openshift/openshift-ansible#5135 |
I'm facing the exactly same error, but once in a while.
This is a fresh 3.6.0 install from ansible. |
A little bit mor context:
Restarting docker resolved the issue. |
In fact it's happening on all node hosts. There are pods |
@caruccio This issue is just related with Builds, as there was a typo in the documentation around defaults resources for Builds. As I can see you are facing this issue with deployments, so your typo would probably be configured in a different place. Check if you have configured resource limits in any other place, or just test the deployment in a new namespace with no resource limits applied and check what happens. |
@makentenza I've tried on a new project, no limits/requests as you suggested, but still getting the same result:
Note there is no limits on the pod definition |
I believe I've figured it out. That nsenter error message appears after container pod is destroyed, thus nsenter can't find process network from file /proc//ns/net in order to inform pod status do controller (I guess). |
I think it's likely that we shouldn't be reporting this error. @sjenning something we should open upstream. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
I am getting this error in the node log, but also "Error syncing pod" and "Pod sandbox changed, it will be killed and re-created." in the "Events" for this build pod.
This happens on a build pod being launched on that node. Pods launched on the master don't display this issue.
Version
oc v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://xyz:8443
openshift v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
Additional Information
This was an upgrade from containerized v1.5.1 -> v3.6.0 (no RCs in between) using the upgrade playbook in the release-3.6 branch of openshift-ansible which was successfully.
This appeared to be fixed in #15210 but doesn't help, and I've verified that the most recent node image is running and /opt is populated in the node container.
I have destroyed the node in question and reinstalled with the ansible scale playbook and that did not help.
oc adm diagnostics
only returned this errorSo it looks like pod connectivity is broken, but for the life of me I can't figure out why.
I could simply reinstall, as this is a test environment, but this is going to go to production eventually so I'd like to see if there's a way to fix this if we encounter it there.
The text was updated successfully, but these errors were encountered: