Skip to content

Route updates don't work in when two routes have same value for one of the route fields #2090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nnbu opened this issue Jun 14, 2024 · 6 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. service/ec2 Indicates issues or PRs that are related to ec2-controller.

Comments

@nnbu
Copy link

nnbu commented Jun 14, 2024

Describe the bug
When we change the routes (e.g. delete some of the routes) from the route table CR, the CR status goes to out of sync for the remaining routes.

Steps to reproduce
This happens in specific case where route comparison logic does not work. E.g. when we have two routes with say, same vpcPeeringConnectionID.

  1. Create following route table CR. Note that there are 2 routes with different destinationCIDRBlock , but same vpcPeeringConnectionID.
apiVersion: ec2.services.k8s.aws/v1alpha1
kind: RouteTable
metadata:
  name: rt1
spec:
  routes:
  - destinationCIDRBlock: 172.29.0.0/16
    gatewayID: igw-<xxx>
  - destinationCIDRBlock: 172.30.0.0/16
    vpcPeeringConnectionID: pcx-05f31992c7ca70be5
  - destinationCIDRBlock: 172.28.0.0/16
     vpcPeeringConnectionID: pcx-05f31992c7ca70be5
  tags:
  - key: Name
    value: abc
  vpcID: vpc-<xxxx>
  1. All 3 routes would get created correctly in aws and CR status will be sync
  2. Delete route 172.29.0.0/16 (without changing two routes with same vpcPeeringConnectionID ). This triggers update and controller tries to find what needs to be deleted/added.
  3. CR goes to out of sync state with error something like
RouteAlreadyExists: The route identified by 172.30.0.0/16 already exists.\n\tstatus
      code: 400, request id: a471eeab-5aa3-43c7-a146-ceb380c6b50f

This suggests that update tried to create route 172.30.0.0/16 and failed with RouteAlreadyExists error.

On prima facie, there appears to be two issues

  1. getMatchingRoute logic has issues when there is one field (vpcPeeringConnectionID) having same value for two routes. Due to this same value, it thinks, the route is getting updated from 172.28.0.0/16 to 172.30.0.0/16. So, it tries add 172.30.0.0/16 and fails with the error.

Ideally, this logic should not have determined any update for these two routes with vpcPeeringConnectionID.

  1. When update determines that there is a route to be deleted and added, it creates toAdd and toDelete lists.
    Ideally, delete should happen first and then add should happen later. But current logic is reverse.

Expected outcome
CR should not go into out of sync state.

Environment

  • Kubernetes version
  • Using EKS (yes/no), if so version?
  • AWS service targeted (S3, RDS, etc.)
@nnbu
Copy link
Author

nnbu commented Jun 14, 2024

In fact, this happens when any kind of update is performed on the CR.
e.g. If we add a new tag (keeping routes section untouched), that also triggers the update. and getMatchingRoute logic fails to understand routes have not been modified. Result is that CR goes out of sync with the above mentioned error.

So, as long as we have two routes with same value for one of the fields (in this case, vpcPeeringConnectionID field), we hit this issue

@nnbu nnbu changed the title Route updates don't work in specific cases Route updates don't work in when two routes have same value for one of the route fields Jun 14, 2024
@a-hilaly a-hilaly added service/route53 Indicates issues or PRs that are related to route53-controller. kind/bug Categorizes issue or PR as related to a bug. labels Jun 17, 2024
@a-hilaly
Copy link
Member

@ack-bot
Copy link
Collaborator

ack-bot commented Dec 14, 2024

Issues go stale after 180d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 60d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle stale

@ack-prow ack-prow bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2024
@gfrey
Copy link

gfrey commented Jan 30, 2025

I have the same problem, where there are multiple CIDR blocks in a peered VPC. The order as specified in ACK is different from what is returned by the API. As the getMatchingRoute returns the first route that has the same peering connection id, this results in all routes being recreated (at least in my case) on each reconciliation run.

Here a (slightly obfuscated) trace:

{"level":"debug","ts":"2025-01-30T07:35:54.304Z","msg":">> r.updateResource","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}

{"level":"info","ts":"2025-01-30T07:35:54.304Z","msg":"desired resource state has changed","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2,
	"diff":[{"Path":{"Parts":["Spec","Routes"]},
		"A":[{"destinationCIDRBlock":"10.100.192.0/21","vpcPeeringConnectionID":"pcx"},{"destinationCIDRBlock":"10.71.16.0/21","vpcPeeringConnectionID":"pcx"},{"destinationCIDRBlock":"10.95.38.64/26","vpcPeeringConnectionID":"pcx"},{"destinationCIDRBlock":"100.64.0.0/16","vpcPeeringConnectionID":"pcx"}],
		"B":[{"destinationCIDRBlock":"10.95.38.64/26","vpcPeeringConnectionID":"pcx"},{"destinationCIDRBlock":"10.71.16.0/21","vpcPeeringConnectionID":"pcx"},{"destinationCIDRBlock":"10.100.192.0/21","vpcPeeringConnectionID":"pcx"},{"destinationCIDRBlock":"100.64.0.0/16","vpcPeeringConnectionID":"pcx"}]}]}

{"level":"debug","ts":"2025-01-30T07:35:54.304Z","msg":">>> rm.Update","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}
{"level":"debug","ts":"2025-01-30T07:35:54.304Z","msg":">>>> rm.customUpdateRouteTable","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}
{"level":"debug","ts":"2025-01-30T07:35:54.304Z","msg":">>>>> rm.syncRoutes","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}
{"level":"debug","ts":"2025-01-30T07:35:54.393Z","msg":"deleting route from route table","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}
{"level":"debug","ts":"2025-01-30T07:35:54.393Z","msg":">>>>>> rm.deleteRoute","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}
{"level":"debug","ts":"2025-01-30T07:35:54.557Z","msg":"<<<<<< rm.deleteRoute","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}
{"level":"debug","ts":"2025-01-30T07:35:54.557Z","msg":"deleting route from route table","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}
{"level":"debug","ts":"2025-01-30T07:35:54.557Z","msg":">>>>>> rm.deleteRoute","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}
{"level":"debug","ts":"2025-01-30T07:35:54.637Z","msg":"<<<<<< rm.deleteRoute","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}
{"level":"debug","ts":"2025-01-30T07:35:54.637Z","msg":"<<<<< rm.syncRoutes","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}
{"level":"debug","ts":"2025-01-30T07:35:54.637Z","msg":"<<<< rm.customUpdateRouteTable","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2}
{"level":"debug","ts":"2025-01-30T07:35:54.637Z","msg":"<<< rm.Update","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2,"latest":{},"error":"InvalidRoute.NotFound: no route with destination-cidr-block 10.95.38.64/26 in route table rtb-XXX\n\tstatus code: 400, request id: XXX"}
{"level":"debug","ts":"2025-01-30T07:35:54.637Z","msg":"<< r.updateResource","kind":"RouteTable","name":"rtb-name","region":"us-east-1","is_adopted":false,"generation":2,"error":"InvalidRoute.NotFound: no route with destination-cidr-block 10.95.38.64/26 in route table rtb-XXX\n\tstatus code: 400, request id: XXX"}

Long story short: it creates two calls to delete the route with the destination CIDR block 10.95.38.64/26. Second one fails, so it never gets to adding routes again. This pattern repeats until all routes are deleted, then they are added again.

I'm wondering about this code block. What is the reason for declaring a route matching, just because one property is?

@michaelhtm michaelhtm added service/ec2 Indicates issues or PRs that are related to ec2-controller. and removed service/route53 Indicates issues or PRs that are related to route53-controller. labels Jan 30, 2025
@a-hilaly a-hilaly removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2025
ack-prow bot pushed a commit to aws-controllers-k8s/ec2-controller that referenced this issue Mar 10, 2025
Fixes issue [#2090](aws-controllers-k8s/community#2090)

Description of changes:
- Configure generator for RouteTable to ignore the Routes from the generated delta code.
- Extend the `customPreCompare` code to take care of the routes (computing the delta ignoring the order of routes).
- Added unit test for the custom code.
- Change the `syncRoutes` code to use the delta from the compare phase.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
@michaelhtm
Copy link
Member

Hello @nnbu, we have released the controller with a fix addressing this issue.
Feel free to reopen this issue if it is not. Thanks!
/close

@ack-prow ack-prow bot closed this as completed Mar 12, 2025
Copy link

ack-prow bot commented Mar 12, 2025

@michaelhtm: Closing this issue.

In response to this:

Hello @nnbu, we have released the controller with a fix addressing this issue.
Feel free to reopen this issue if it is not. Thanks!
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. service/ec2 Indicates issues or PRs that are related to ec2-controller.
Projects
None yet
Development

No branches or pull requests

5 participants