Skip to content

[perf] MCAD constantly throttled #434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kpouget opened this issue Jun 26, 2023 · 3 comments
Open

[perf] MCAD constantly throttled #434

kpouget opened this issue Jun 26, 2023 · 3 comments

Comments

@kpouget
Copy link

kpouget commented Jun 26, 2023

When looking at the MCAD logs, I see that it is constantly being throttled, and it seems to be requesting all the CRDs available in the cluster:

I0626 13:23:58.178716       1 request.go:591] Throttling request took 537.935392ms, request: GET:https://172.30.0.1:443/apis/monitoring.coreos.com/v1alpha1?timeout=32s
I0626 13:23:58.189164       1 request.go:591] Throttling request took 548.378517ms, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1alpha1?timeout=32s
I0626 13:23:58.198620       1 request.go:591] Throttling request took 557.834296ms, request: GET:https://172.30.0.1:443/apis/scheduling.k8s.io/v1?timeout=32s
I0626 13:23:58.209032       1 request.go:591] Throttling request took 568.24236ms, request: GET:https://172.30.0.1:443/apis/imageregistry.operator.openshift.io/v1?timeout=32s
I0626 13:23:58.218479       1 request.go:591] Throttling request took 577.692789ms, request: GET:https://172.30.0.1:443/apis/serving.kserve.io/v1alpha1?timeout=32s
I0626 13:23:58.229013       1 request.go:591] Throttling request took 588.223945ms, request: GET:https://172.30.0.1:443/apis/network.openshift.io/v1?timeout=32s
I0626 13:23:58.238502       1 request.go:591] Throttling request took 597.708975ms, request: GET:https://172.30.0.1:443/apis/autoscaling.openshift.io/v1beta1?timeout=32s
I0626 13:23:58.249066       1 request.go:591] Throttling request took 608.272682ms, request: GET:https://172.30.0.1:443/apis/coordination.k8s.io/v1?timeout=32s
I0626 13:23:58.258451       1 request.go:591] Throttling request took 617.672599ms, request: GET:https://172.30.0.1:443/apis/helm.openshift.io/v1beta1?timeout=32s
I0626 13:23:58.268227       1 request.go:591] Throttling request took 627.428484ms, request: GET:https://172.30.0.1:443/apis/cloud.network.openshift.io/v1?timeout=32s
I0626 13:23:58.278857       1 request.go:591] Throttling request took 638.056671ms, request: GET:https://172.30.0.1:443/apis/node.k8s.io/v1?timeout=32s
I0626 13:23:58.288982       1 request.go:591] Throttling request took 648.181891ms, request: GET:https://172.30.0.1:443/apis/kubeflow.org/v1beta1?timeout=32s
I0626 13:23:58.298386       1 request.go:591] Throttling request took 657.590819ms, request: GET:https://172.30.0.1:443/apis/network.operator.openshift.io/v1?timeout=32s
I0626 13:23:58.308934       1 request.go:591] Throttling request took 668.127925ms, request: GET:https://172.30.0.1:443/apis/discovery.k8s.io/v1?timeout=32s
I0626 13:23:58.318269       1 request.go:591] Throttling request took 677.466823ms, request: GET:https://172.30.0.1:443/apis/cloudcredential.openshift.io/v1?timeout=32s
I0626 13:23:58.328651       1 request.go:591] Throttling request took 687.847126ms, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v2?timeout=32s
I0626 13:23:58.338012       1 request.go:591] Throttling request took 697.203084ms, request: GET:https://172.30.0.1:443/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s
I0626 13:23:58.348316       1 request.go:591] Throttling request took 707.516076ms, request: GET:https://172.30.0.1:443/apis/performance.openshift.io/v2?timeout=32s
I0626 13:23:58.358629       1 request.go:591] Throttling request took 717.817759ms, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v1?timeout=32s
I0626 13:23:58.368962       1 request.go:591] Throttling request took 728.145512ms, request: GET:https://172.30.0.1:443/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s
I0626 13:23:58.378359       1 request.go:591] Throttling request took 737.54763ms, request: GET:https://172.30.0.1:443/apis/migration.k8s.io/v1alpha1?timeout=32s
I0626 13:23:58.388688       1 request.go:591] Throttling request took 747.866993ms, request: GET:https://172.30.0.1:443/apis/config.openshift.io/v1?timeout=32s
I0626 13:23:58.398111       1 request.go:591] Throttling request took 757.287052ms, request: GET:https://172.30.0.1:443/apis/kfdef.apps.kubeflow.org/v1?timeout=32s
I0626 13:23:58.408403       1 request.go:591] Throttling request took 767.582294ms, request: GET:https://172.30.0.1:443/apis/machine.openshift.io/v1?timeout=32s
I0626 13:23:58.418845       1 request.go:591] Throttling request took 778.011698ms, request: GET:https://172.30.0.1:443/apis/apps.openshift.io/v1?timeout=32s
I0626 13:23:58.428329       1 request.go:591] Throttling request took 787.499978ms, request: GET:https://172.30.0.1:443/apis/machine.openshift.io/v1beta1?timeout=32s
I0626 13:23:58.438809       1 request.go:591] Throttling request took 797.974212ms, request: GET:https://172.30.0.1:443/apis/authorization.openshift.io/v1?timeout=32s
I0626 13:23:58.448058       1 request.go:591] Throttling request took 807.243089ms, request: GET:https://172.30.0.1:443/apis/kubeflow.org/v1alpha1?timeout=32s
I0626 13:23:58.458594       1 request.go:591] Throttling request took 817.752805ms, request: GET:https://172.30.0.1:443/apis/build.openshift.io/v1?timeout=32s
I0626 13:23:58.467979       1 request.go:591] Throttling request took 827.137372ms, request: GET:https://172.30.0.1:443/apis/oauth.openshift.io/v1?timeout=32s
I0626 13:23:58.478054       1 request.go:591] Throttling request took 837.214141ms, request: GET:https://172.30.0.1:443/apis/performance.openshift.io/v1?timeout=32s
I0626 13:23:58.488482       1 request.go:591] Throttling request took 847.641595ms, request: GET:https://172.30.0.1:443/apis/project.openshift.io/v1?timeout=32s
I0626 13:23:58.498475       1 request.go:591] Throttling request took 857.702755ms, request: GET:https://172.30.0.1:443/apis/codeflare.codeflare.dev/v1alpha1?timeout=32s
I0626 13:23:58.507913       1 request.go:591] Throttling request took 867.086812ms, request: GET:https://172.30.0.1:443/apis/kubeflow.org/v1?timeout=32s

this cannot be a good thing for performance.

@asm582
Copy link
Member

asm582 commented Jun 26, 2023

Agree, we need to remove unused controllers in MCAD, here is a PR that we can revive: #277

@asm582 asm582 changed the title [perf] MCAD constantly thottled [perf] MCAD constantly throttled Aug 2, 2023
@asm582
Copy link
Member

asm582 commented Aug 24, 2023

@astefanutti MCAD from the main branch has some remediations around this problem, can you recommend what more could be improved?

@astefanutti
Copy link
Contributor

@asm582 these messages seem to be caused by an excessive usage of the discovery API, which are being client-side throttle, despite the QPS and maximum burst limits have already been increased.

We could speculatively close this, given the large refactoring that has happened lately, but a quick search in the code points to the genericresource.go file, that calls the discovery API for mapping the generic resources GVK to the API resource.

My suggestion would be to look at it more closely, and consider putting some caching or rate limiting mechanisms in place, for consuming the discovery API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants