Skip to content

Fix: EC2 controller for vpcendpoint does not update status #191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 22, 2024

Conversation

nnbu
Copy link
Contributor

@nnbu nnbu commented May 21, 2024

Issue : aws-controllers-k8s/community#2075

Description of changes:
The issue is that vpce CR status is not updated. This happens because when vpce is created, it takes around a minute for it to be created in aws. Due to this, it's status is kept pending. But there is no sync enforced on the reconciler. Due to this, reconciliation happens after 10h default time. At this time, status would be set correctly.

This fix enforces reconciliation to make sure CR is updated correctly when vpce creation in aws is successful.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

ack-prow bot commented May 21, 2024

Hi @nnbu. Thanks for your PR.

I'm waiting for a aws-controllers-k8s member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ack-prow ack-prow bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 21, 2024
@ack-prow ack-prow bot requested review from a-hilaly and vijtrip2 May 21, 2024 22:24
@a-hilaly
Copy link
Member

/ok-to-test

@ack-prow ack-prow bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 21, 2024
Copy link
Member

@a-hilaly a-hilaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you @nnbu !
Shall we make a release for this?

build_date: "2024-05-21T22:16:46Z"
build_hash: d660ee36fe947607ebea039acd47c35477b4a836
go_version: go1.21.1
version: v0.28.0-58-gd660ee3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can please you use the latest version of code-gen?
/hold

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ack-prow ack-prow bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved labels May 21, 2024
@a-hilaly
Copy link
Member

@nnbu can you also update the e2e tests? i think we might have to wait more than 5x30 seconds
/retest

Issue : aws-controllers-k8s/community#2075

Description of changes:
The issue is that vpce CR status is not updated. This happens because when vpce is created, it takes around a minute for it to be created in aws. Due to this, it's status is kept pending. But there is no sync enforced on the reconciler. Due to this, reconciliation happens after 10h default time. At this time, status would be set correctly.

This fix enforces reconciliation to make sure CR is updated correctly when vpce creation in aws is successful.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
@nnbu
Copy link
Contributor Author

nnbu commented May 22, 2024

@nnbu can you also update the e2e tests? i think we might have to wait more than 5x30 seconds /retest

@a-hilaly this is done now

Copy link
Member

@a-hilaly a-hilaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nnbu !
/lgtm
/test all

@ack-prow ack-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 22, 2024
@a-hilaly
Copy link
Member

/unhold

@ack-prow ack-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 22, 2024
Copy link

ack-prow bot commented May 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: a-hilaly, nnbu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ack-prow ack-prow bot merged commit 34439b0 into aws-controllers-k8s:main May 22, 2024
6 checks passed
a-hilaly pushed a commit to a-hilaly/ack-ec2-controller that referenced this pull request May 31, 2024
…ollers-k8s#191)

Issue : aws-controllers-k8s/community#2075

Description of changes:
The issue is that vpce CR status is not updated. This happens because when vpce is created, it takes around a minute for it to be created in aws. Due to this, it's status is kept `pending`. But there is no sync enforced on the reconciler. Due to this, reconciliation happens after 10h default time. At this time, status would be set correctly.

This fix enforces reconciliation to make sure CR is updated correctly when vpce creation in aws is successful.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
a-hilaly pushed a commit to a-hilaly/ack-ec2-controller that referenced this pull request May 31, 2024
…ollers-k8s#191)

Issue : aws-controllers-k8s/community#2075

Description of changes:
The issue is that vpce CR status is not updated. This happens because when vpce is created, it takes around a minute for it to be created in aws. Due to this, it's status is kept `pending`. But there is no sync enforced on the reconciler. Due to this, reconciliation happens after 10h default time. At this time, status would be set correctly.

This fix enforces reconciliation to make sure CR is updated correctly when vpce creation in aws is successful.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
a-hilaly added a commit to a-hilaly/ack-ec2-controller that referenced this pull request May 31, 2024
…ollers-k8s#191)

Issue : aws-controllers-k8s/community#2075

Description of changes:
The issue is that vpce CR status is not updated. This happens because when vpce is created, it takes around a minute for it to be created in aws. Due to this, it's status is kept `pending`. But there is no sync enforced on the reconciler. Due to this, reconciliation happens after 10h default time. At this time, status would be set correctly.

This fix enforces reconciliation to make sure CR is updated correctly when vpce creation in aws is successful.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
@wjf3121
Copy link
Contributor

wjf3121 commented Jun 5, 2024

@nnbu @a-hilaly

I found a potential bug with this implementation: aws-controllers-k8s/community#1933 (comment)

It seems that we stop syncing the state once the VPCEndpoint becomes available. However an available VPCEndpoint can still later be rejected and the controller will not update the state accordingly.

@nnbu
Copy link
Contributor Author

nnbu commented Jun 5, 2024 via email

Comment on lines +1 to +8

if !vpcEndpointAvailable(&resource{ko}) {
// Setting resource synced condition to false will trigger a requeue of
// the resource. No need to return a requeue error here.
ackcondition.SetSynced(&resource{ko}, corev1.ConditionFalse, nil, nil)
} else {
ackcondition.SetSynced(&resource{ko}, corev1.ConditionTrue, nil, nil)
}
Copy link
Contributor

@wjf3121 wjf3121 Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnbu Thanks for the explanation in this comment, it makes sense to me mostly! I still have one question over this implmenetation.

Here we consider a VPCEndpoint resource not synced as long as it's not in available state. However a VPC endpoint can be in pendingAcceptance state after creation and stay in that state forever/later be in rejected state -- either way it never becomes available

It seems to me that in controller's perspective it should be considered synced in these cases. In earlier versions of the controller that indeed is that case. But with this change, the VPCEndpoint will remain unsynced after its creation unless it becomes available later, which may never happen in some cases.

Does it make sense to only force the resource to be unsynced only when the resource is in pending instead of when it's in a non available state? This seems more inline with the endpoint states definition:

Pending - The service provider accepted the connection request. This is the initial state if requests are automatically accepted. The VPC endpoint returns to this state if the service consumer modifies the VPC endpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have such automated pipeline that for a VPCEndpoint that requires manual acceptance, we first wait til the resource to be synced and then accept the connection through API. The change will break the behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnbu @a-hilaly , if it makes sense to you, here's the fix I proposed on top of this change: #198

@nnbu
Copy link
Contributor Author

nnbu commented Jun 5, 2024 via email

@wjf3121
Copy link
Contributor

wjf3121 commented Jun 5, 2024

Your PR looks good to me if pending is the only transitionary state.

pending and deleting are the only two transitionary states AFAIK and deleting should already be handled correctly assuming the resource is fully managed by the controller. It would be awesome if you can confirm this internally:)

ack-prow bot pushed a commit that referenced this pull request Jul 15, 2024
Context: #191 (comment)

A synced VPC endpoint should only be forced to re-sync when it's in `pending` state. For other states like `pendingAcceptance`, `rejected`, `failed`, `expired`, it could be the final state of the endpoint and the sync should be considered completed.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
nnbu added a commit to nnbu/ack-ec2-controller that referenced this pull request Sep 18, 2024
…ollers-k8s#191)

Issue : aws-controllers-k8s/community#2075

Description of changes:
The issue is that vpce CR status is not updated. This happens because when vpce is created, it takes around a minute for it to be created in aws. Due to this, it's status is kept `pending`. But there is no sync enforced on the reconciler. Due to this, reconciliation happens after 10h default time. At this time, status would be set correctly.

This fix enforces reconciliation to make sure CR is updated correctly when vpce creation in aws is successful. 


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
nnbu pushed a commit to nnbu/ack-ec2-controller that referenced this pull request Sep 18, 2024
…trollers-k8s#198)

Context: aws-controllers-k8s#191 (comment)

A synced VPC endpoint should only be forced to re-sync when it's in `pending` state. For other states like `pendingAcceptance`, `rejected`, `failed`, `expired`, it could be the final state of the endpoint and the sync should be considered completed.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants