fix pause handling implementation on the controllers #460

rahulbabu95 · 2025-04-30T01:13:24Z

Description

Cluster-API version (>=1.9.5) incorporated a fix for a race condition in their clusterctl move logic. Currently, during a cluster move operation, CAPT controllers interpret the temporary absence of TinkerbellMachine CRDs in the source cluster as a deletion event. This triggers power-off jobs, potentially causing catastrophic effects for users, when in reality the resources are just being moved from source to target cluster.

This PR implements proper pause handling in both cluster and machine controllers to prevent unwanted reconciliation during cluster move operations. When a CAPI cluster is paused:

Controllers check for pause annotations before proceeding with reconciliation
Reconciliation is halted if pause is detected

Why is this needed

Fixes: #

How Has This Been Tested?

Tested with a custom built controller and moving the CRs back and forth using clusterctl move.

How are existing users impacted? What migration steps/scripts do we need?

Checklist:

I have:

updated the documentation and/or roadmap (if required)
added unit or e2e tests
provided instructions on how to upgrade

controller/cluster/tinkerbellcluster.go

Update pause handling in both cluster and machine controllers to pause further reconciliation when the capi cluster is paused. Signed-off-by: Rahul Ganesh <[email protected]>

Signed-off-by: Rahul Ganesh <[email protected]>

Copilot

Pull Request Overview

This PR improves pause handling in the Tinkerbell controllers to prevent unwanted reconciliation during cluster moves. The changes include:

Adding cluster lookup and pause annotation checks in the machine controller.
Reordering logic in the machine controller to incorporate pause validation.
Introducing pause annotation checks in the cluster controller.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
controller/machine/tinkerbellmachine.go	Adds cluster lookup with pause checking; note the reordering of deletion logic.
controller/cluster/tinkerbellcluster.go	Introduces pause annotation checks with a TODO for future enhancement regarding pause handling.

controller/machine/tinkerbellmachine.go

controller/cluster/tinkerbellcluster.go

rahulbabu95 commented Apr 30, 2025

View reviewed changes

controller/cluster/tinkerbellcluster.go Show resolved Hide resolved

Improve pause handling implementation on the controllers

8ad3f83

Update pause handling in both cluster and machine controllers to pause further reconciliation when the capi cluster is paused. Signed-off-by: Rahul Ganesh <[email protected]>

rahulbabu95 force-pushed the fix/skip-reconcile-reboot-workflow branch from d44b182 to 8ad3f83 Compare April 30, 2025 01:18

Fix tests and reorder logic to avoid nil panic

1d63d53

Signed-off-by: Rahul Ganesh <[email protected]>

jacobweinstock requested a review from Copilot April 30, 2025 18:55

Copilot AI reviewed Apr 30, 2025

View reviewed changes

controller/machine/tinkerbellmachine.go Show resolved Hide resolved

controller/cluster/tinkerbellcluster.go Show resolved Hide resolved

jacobweinstock approved these changes Apr 30, 2025

View reviewed changes

jacobweinstock added ready-to-merge Signal to Mergify to merge the PR. kind/bug Categorizes issue or PR as related to a bug. labels Apr 30, 2025

Merge branch 'main' into fix/skip-reconcile-reboot-workflow

b86f532

mergify bot merged commit 49a734f into tinkerbell:main Apr 30, 2025
9 of 12 checks passed

jacobweinstock changed the title ~~Improve pause handling implementation on the controllers~~ fix pause handling implementation on the controllers Apr 30, 2025

rahulbabu95 deleted the fix/skip-reconcile-reboot-workflow branch April 30, 2025 19:05

rahulbabu95 mentioned this pull request May 6, 2025

Remove hardcoded version pin for capi project in upgrader aws/eks-anywhere-build-tooling#4580

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix pause handling implementation on the controllers #460

fix pause handling implementation on the controllers #460

rahulbabu95 commented Apr 30, 2025 •

edited

Loading

Copilot AI left a comment

fix pause handling implementation on the controllers #460

fix pause handling implementation on the controllers #460

Conversation

rahulbabu95 commented Apr 30, 2025 • edited Loading

Description

Why is this needed

How Has This Been Tested?

How are existing users impacted? What migration steps/scripts do we need?

Checklist:

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

rahulbabu95 commented Apr 30, 2025 •

edited

Loading