Skip to content

Changes to use capabilities CR on supervisor and guest cluster instead of using deprecated ConfigMap #3279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vdkotkar
Copy link
Contributor

@vdkotkar vdkotkar commented May 15, 2025

What this PR does / why we need it:
Changes to use capabilities CR on supervisor and guest cluster instead of using deprecated ConfigMap for getting FSS values.
Here is the overview of changes:

  1. We need to add CRD, API, RBAC etc. files related to Capabilities locally in CSI code, so that we can access this API.
  2. For supervisor, replace code to fetch capability value from Capabilities CR instead of fetching it from ConfigMap which is deprecated now.
  3. For Guest, read Capabilities CR from supervisor cluster. There is no need to read replicated FSS value from supervisor now.
  4. Add ticker which checks capability value periodically and if value changes from false to true, then restart CSI container for workload-domain-isolation feature. This code can be enhanced for other features with same requirement in future. For other features, instead of restarting CSI driver, we can write some function which will dynamically enable the feature instead of needing to restart CSI container.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Testing done:
Verified changes on Supervisor and Guest cluster. Verified that we can fetch capability values by reading Capabilities CR from both supervisor and guest clusters.

CSI controller logs:
Supervisor:

{"level":"info","time":"2025-05-15T13:09:19.484390553Z","caller":"k8sorchestrator/k8sorchestrator.go:1112","msg":"Feature \"PodVM_On_Stretched_Supervisor_Supported\" is a WCP defined feature state. Reading the capabilities CR \"supervisor-capabilities\".","TraceId":"7058ff0f-caf1-437e-859a-8f1f256f8ee9"}
{"level":"info","time":"2025-05-15T13:09:19.484412005Z","caller":"k8sorchestrator/k8sorchestrator.go:1117","msg":"wcpCpabilitiesMap is empty","TraceId":"7058ff0f-caf1-437e-859a-8f1f256f8ee9"}
{“level":"info","time":"2025-05-15T13:09:19.50657438Z","caller":"k8sorchestrator/k8sorchestrator.go:1067","msg":"successfully fetched capabilities CR instance - &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:supervisor-capabilities GenerateName: Namespace
…

{"level":"info","time":"2025-05-15T13:09:25.902387789Z","caller":"k8sorchestrator/k8sorchestrator.go:1112","msg":"Feature \"Workload_Domain_Isolation_Supported\" is a WCP defined feature state. Reading the capabilities CR \"supervisor-capabilities\".","TraceId":"c88ffa92-cdee-49e8-8905-cb79c4043ec6"}
{"level":"info","time":"2025-05-15T13:09:25.902524289Z","caller":"k8sorchestrator/k8sorchestrator.go:1138","msg":"Supervisor capability \"Workload_Domain_Isolation_Supported\" is set to true","TraceId":"c88ffa92-cdee-49e8-8905-cb79c4043ec6"}

Guest:

{"level":"info","time":"2025-05-15T14:26:09.636234545Z","caller":"service/driver.go:113","msg":"Configured: \"csi.vsphere.vmware.com\" with clusterFlavor: \"GUEST_CLUSTER\" and mode: \"controller\"","TraceId":"0c06e786-5783-452e-8110-aeb7245537ab"}
..
{"level":"info","time":"2025-05-15T14:26:09.634202195Z","caller":"k8sorchestrator/k8sorchestrator.go:1067","msg":"successfully fetched capabilities CR instance - &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:supervisor-capabilities GenerateName: Namespace:
….
{"level":"info","time":"2025-05-15T14:26:09.635172809Z","caller":"k8sorchestrator/k8sorchestrator.go:1228","msg":"WCP cluster capabilities map - map[Add_NSX_Day2_M1_Supported:false CSI_Detach_Supported:true Control_Plane_Backup_Restore_Supported:true Foundation_Load_Balancer_Supported:true Fronting_CPVM_Services_Via_LB_Supported:false Import_OVF_In_Namespace_CL_Supported:true Load_Balancer_Supported:true MultipleCL_For_TKG_Supported:true Only_Generate_Http_Proxy_Secret:true PodVM_On_DHCP_VDS_Supported:true PodVM_On_Stretched_Supervisor_Supported:true PodVM_On_VDS_Supported:true Resume_Failed_Supervisor_Upgrade_Supported:true Supervisor_New_Folder_Hierarchy_Supported:true TKG_SupervisorService_Supported:true Tanzu_Topology_CR_Supported:true User_Pods_On_VDS_Supported:true VMImage_ResourceNamingStrategy_Supported:true VPC_Supported:true Vdpp_On_Stretched_Supervisor:true Workload_Domain_Isolation_Supported:true supports_Supervisor_Service_allow_list:true supports_VM_service_ISO_deployment:true supports_VM_service_resize_CPU_memory:true supports_easy_sv_wldi:true supports_embedded_Supervisor_Service_ConfigMap:true supports_max_concurrent_dns_forwards:true supports_secure_Supervisor_Service_platform:false supports_supervisor_VCFOps_integration:true supports_supervisor_async_upgrade:true supports_workload_CLI:true]","TraceId":"c2c09562-170c-41df-a78f-b5bd000ba76c"}
….
{"level":"info","time":"2025-05-15T14:26:09.635209647Z","caller":"k8sorchestrator/k8sorchestrator.go:1232","msg":"Supervisor capability \"Workload_Domain_Isolation_Supported\" is set to true","TraceId":"c2c09562-170c-41df-a78f-b5bd000ba76c"}

WCP pre-check-in pipeline with changes:
https://jenkins-vcf-csifvt.devops.broadcom.net/view/Pre-Checkin-CSI/job/csi-wcp-precheckin/160/

GC pre-check-in pipeline with changes:
https://jenkins-vcf-csifvt.devops.broadcom.net/view/Pre-Checkin-CSI/job/gc-csi-precheckin/157/

Special notes for your reviewer:

Release note:

Changes to use capabilities CR on supervisor and guest cluster instead of using deprecated ConfigMap

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 15, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @vdkotkar. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vdkotkar
Once this PR has been reviewed and has the lgtm label, please assign adikul30 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 15, 2025
@vdkotkar vdkotkar changed the title [WIP] Changes to use capabilities CR on supervisor and guest cluster instead of using deprecated ConfigMap Changes to use capabilities CR on supervisor and guest cluster instead of using deprecated ConfigMap May 19, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 19, 2025
@SandeepPissay
Copy link
Contributor

Add ticker which checks capability value periodically and if value changes from false to true, then restart CSI container for workload-domain-isolation feature.

@vdkotkar Restarting CSI will add load to the system since it has to recache objects into memory, unnecessary reconcilers running on all CRs, etc. Could you provide some reasoning for why we should restart CSI and not dynamically enable a feature without restart?

@vdkotkar
Copy link
Contributor Author

vdkotkar commented May 20, 2025

Add ticker which checks capability value periodically and if value changes from false to true, then restart CSI container for workload-domain-isolation feature.

@vdkotkar Restarting CSI will add load to the system since it has to recache objects into memory, unnecessary reconcilers running on all CRs, etc. Could you provide some reasoning for why we should restart CSI and not dynamically enable a feature without restart?

If feature is able to dynamically enable required things without restart, then that can be done. For new features, we will encourage it. Specifically for workload-domain-isolation feature, we were restarting CSI driver before these changes as well, so I kept the same behaviour (this will happen once in a while in cases of upgrade etc.). I have mentioned it in comments as well.

@vdkotkar vdkotkar force-pushed the use_wcp_capabilities_cr branch 3 times, most recently from 257602f to f1f1502 Compare May 26, 2025 12:56
@chethanv28
Copy link
Collaborator

Changes to use capabilities CR on supervisor and guest cluster instead of using deprecated ConfigMap for getting FSS values.

deprecated ConfigMap ==> refers to wcp capability configmap right ? The csi-feature-states configmap will continue to remain in supervisor cluster, correct ?

@vdkotkar vdkotkar force-pushed the use_wcp_capabilities_cr branch from f1f1502 to a34d06b Compare June 2, 2025 09:26
@akankshapanse
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 2, 2025
@vdkotkar
Copy link
Contributor Author

vdkotkar commented Jun 2, 2025

Changes to use capabilities CR on supervisor and guest cluster instead of using deprecated ConfigMap for getting FSS values.

deprecated ConfigMap ==> refers to wcp capability configmap right ? The csi-feature-states configmap will continue to remain in supervisor cluster, correct ?

Yes, wcp capability ConfigMap is deprecated. csi-feature-states configMap is still required for CSI specific feature changes.

@vdkotkar vdkotkar force-pushed the use_wcp_capabilities_cr branch from a34d06b to 770307e Compare June 2, 2025 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants