Skip to content

fix pause handling implementation on the controllers #460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

rahulbabu95
Copy link

@rahulbabu95 rahulbabu95 commented Apr 30, 2025

Description

Cluster-API version (>=1.9.5) incorporated a fix for a race condition in their clusterctl move logic. Currently, during a cluster move operation, CAPT controllers interpret the temporary absence of TinkerbellMachine CRDs in the source cluster as a deletion event. This triggers power-off jobs, potentially causing catastrophic effects for users, when in reality the resources are just being moved from source to target cluster.

This PR implements proper pause handling in both cluster and machine controllers to prevent unwanted reconciliation during cluster move operations. When a CAPI cluster is paused:

  • Controllers check for pause annotations before proceeding with reconciliation
  • Reconciliation is halted if pause is detected

Why is this needed

Fixes: #

How Has This Been Tested?

Tested with a custom built controller and moving the CRs back and forth using clusterctl move.

How are existing users impacted? What migration steps/scripts do we need?

Checklist:

I have:

  • updated the documentation and/or roadmap (if required)
  • added unit or e2e tests
  • provided instructions on how to upgrade

Update pause handling in both cluster and machine controllers to
pause further reconciliation when the capi cluster is paused.

Signed-off-by: Rahul Ganesh <[email protected]>
@rahulbabu95 rahulbabu95 force-pushed the fix/skip-reconcile-reboot-workflow branch from d44b182 to 8ad3f83 Compare April 30, 2025 01:18
@jacobweinstock jacobweinstock requested a review from Copilot April 30, 2025 18:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves pause handling in the Tinkerbell controllers to prevent unwanted reconciliation during cluster moves. The changes include:

  • Adding cluster lookup and pause annotation checks in the machine controller.
  • Reordering logic in the machine controller to incorporate pause validation.
  • Introducing pause annotation checks in the cluster controller.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
controller/machine/tinkerbellmachine.go Adds cluster lookup with pause checking; note the reordering of deletion logic.
controller/cluster/tinkerbellcluster.go Introduces pause annotation checks with a TODO for future enhancement regarding pause handling.

@jacobweinstock jacobweinstock added ready-to-merge Signal to Mergify to merge the PR. kind/bug Categorizes issue or PR as related to a bug. labels Apr 30, 2025
@mergify mergify bot merged commit 49a734f into tinkerbell:main Apr 30, 2025
9 of 12 checks passed
@jacobweinstock jacobweinstock changed the title Improve pause handling implementation on the controllers fix pause handling implementation on the controllers Apr 30, 2025
@rahulbabu95 rahulbabu95 deleted the fix/skip-reconcile-reboot-workflow branch April 30, 2025 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. ready-to-merge Signal to Mergify to merge the PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants