-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
du and find on following dirs took long time #2212
Comments
try running with |
I've tried it but it's still stuck, I wonder if there are other file system operations |
what do you mean by stuck? What happens? |
That "/metrics" endpoint is not accessible or is very slow,and It crashed very quickly |
Can you share the logs from when that happens? |
I also encountered the same errors below had to restart the affected master node. What is the root cause? Is it docker? k8s: 1.13.0 If you notice there are failured in the logs but is reported as info. LOGS:
|
@dcvtruong It looks like you have many symptoms, including slow du/find calls, and timeouts to docker. Most of the time, the root cause is resource constraints including CPU and DiskIO contention. This could be made worse if the writable layers of the containers are very large. |
More ram was added to the node to handle more requests. |
Hi @dashpole @TonySun1994 @dcvtruong I have got same issue.
After troubleshooting it was found 2 processes with high CPU usage and long running in pods where wordpress websites hosted. It was reported that it's cadvisor issue But the fact that it's only observed under high CPU load made this issue harder to detect.
This was exactly how cluster behaved. Any kubectl command stuck. But if to leave it in terminal the command finally will get executed. So after waiting pods with virus to be deleted cluster has been recovered to normal function in case described. |
Running cadvisor with the container,The log shows the following:
I0328 13:41:29.716552 1 fsHandler.go:135] du and find on following dirs took 13.006500097s: [/rootfs/vdata/docker/overlay2/1259cbee3d31494942129513374cc8702ad68c676b700638534a52b19aadcb26/diff /rootfs/vdata/docker/containers/d866dd5101d9788bdf71a7aacd012bae15c2c304a4b70c80b2647c05a7ebc050]; will not log again for this container unless duration exceeds 5s
I0328 13:41:32.106341 1 fsHandler.go:135] du and find on following dirs took 13.308135764s: [/rootfs/vdata/docker/overlay2/5dfa8775376b9725b0af27edb4ba93a2a9da368d4099633b49f6dba3ae5e6a70/diff /rootfs/vdata/docker/containers/50a1a11de04b411c270683dc834696d1150433120d44deb70bae3ba109a54b06]; will not log again for this container unless duration exceeds 6s
I0328 13:41:32.204471 1 fsHandler.go:135] du and find on following dirs took 15.599689255s: [/rootfs/vdata/docker/overlay2/102debde6d41426b202167877b92bd61b4d4d2f3f9b67c9c4241b760898fa9cb/diff /rootfs/vdata/docker/containers/45b76dbc340d3bb358ec9ec08d5341822575ed68245ea2976596bf17dc2d3a9f]; will not log again for this container unless duration exceeds 6s
I0328 13:41:32.804843 1 fsHandler.go:135] du and find on following dirs took 14.19422831s: [/rootfs/vdata/docker/overlay2/a426d4717c6454ed568e028f95d055f86cebaba26048ac09c019a758d9456158/diff /rootfs/vdata/docker/containers/32d2324f8f4f5727eaf61579148f8915e6019361f032ddcc11c66cb9685b11ff]; will not log again for this container unless duration exceeds 6s
I0328 13:42:15.917654 1 fsHandler.go:135] du and find on following dirs took 11.919525783s: [/rootfs/vdata/docker/overlay2/f2d7561691dd2f3aaee9bf753a152c793edb987717d9e56381df65e4532ec4ee/diff /rootfs/vdata/docker/containers/4cec80a94dabed251376f50c9b24a578d0a6525bfbb096ce73b63adff97db315]; will not log again for this container unless duration exceeds 2s
I know it is caused by the deep file system hierarchy and the large number of small files, so I want to cancel the metrics collection for the file system and disk, so that I can focus on the collection of CPU and memory,Now the web service cannot be started and “/metrics“ cannot be accessed
The text was updated successfully, but these errors were encountered: