-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[server] Investigate server event loop lag #7082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/schedule |
@geropl: Cannot schedule issue - issue does not belong to a team. Use /team to specify one. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/schedule |
@geropl: Issue scheduled in the meta team (WIP: 0) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign |
Not fixed yet. |
Taking CPU profiles reveals some candidates for improvements, but nothing explaining the accumulating lag, yet:
|
@geropl, happy to discuss findings and provide any details on the first two. |
There is now a draft PR for the biggest issues mentioned above. But the root cause remains unidentified. 😕 For the sake of making the holidays safe, I will add an alert to ensure we're not missing anything bad here. |
Let's not close this before the lag has disappeared. |
Removed priority label, because the user impact is currently mitigated by on-call. Doesn't mean we're not working on this. |
Dropping assignment, as I'm not actively working on this. |
UpdateFor what has happened during the last weeks on this:
Suggestions:
|
Received a page moments ago about eventloop lag (internal link) which self-healed. |
This is more of an alerting/PagerDuty artifact. |
Closing this issue, because it represents old behavior and based on old code. It does not make sense to continue here, especially as we can always go back and read up on the history/prior investigation. |
We see an accumulating node.js event loop lag increase:

This is bad, because it is a min. (!!!) static addition to every API call.
(Source: Grafana)
The text was updated successfully, but these errors were encountered: