-
Notifications
You must be signed in to change notification settings - Fork 443
output kube-system logs from workload clusters #1121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
why not use the edit: this might not be possible, just wondering if it is... |
aboveMachinesPath := strings.Replace(outputPath, "/machines", "", 1) | ||
workload := acp.GetWorkloadCluster(ctx, namespace, name) | ||
pods := &corev1.PodList{} | ||
Expect(workload.GetClient().List(ctx, pods, client.InNamespace(kubesystem))).To(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none of this is CAPZ specific and it would useful to have kube-system pod logs for all CAPI clusters, would it make sense to make this change directly in the framework's log collector instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would, but I want it today, not in the next CAPI release :).
I will gladly open a similar PR in CAPI.
@CecileRobertMichon, great idea. That was the first thing I tried to do as it seemed like the right extension point for this functionality. Unfortuantely, the I'd like to add another hook to this collector for collecting cluster level logs. Possibly, also a hook for setting up namespace watches when the cluster first starts so that the collector could start listening to logs or any other logging stuff before waiting on the workload cluster to be built and all of the workloads to be in a ready state, as it is today with wdyt? |
Sounds great. Let's do both, this PR can help us short term to debug the test failures, and we can add the hook to the CAPI framework in parallel. Might also be cool to get/describe the pods in kube-system so we get a quick view to know if anything was in crashloop or error state. |
/retest |
/test pull-cluster-api-provider-azure-e2e |
pull-cluster-api-provider-azure-e2e failed due to GatewayTimeout and 504 errors. Seems really odd. Retrying.
|
/retest last failure was unrelated to LB flake |
/retest |
bdd0685
to
2cd905e
Compare
/retest |
updated k8s to 1.19.7 in #1126 |
I'm going to trim out all of the testing related stuff that was added to this PR. /hold |
/hold cancel Trimmed down the PR to output verbose cloud provider logs and gathers all of kube-system for the workload clusters. /assign @CecileRobertMichon @nader-ziada |
/lgtm |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CecileRobertMichon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
2 similar comments
/retest |
/retest |
What type of PR is this?
/kind feature
What this PR does / why we need it:
It's tough to diagnose what's happening in some flaky e2e tests. By collecting kube-system logs we can get a better idea of why tests are failing.
This PR intercepts the call to
CollectWorkloadClusterLogs
so that it can inject code to pull all of the logs for the kube-system pods.Also, added a slightly higher verbosity to controller manager logging (v=4).
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
Release note: