Transparent resource adoption #1381

liger1978 · 2022-07-04T08:40:56Z

Is your feature request related to a problem?

It seems that with AdoptedResources, I must know ahead of time if the resource within AWS already exists and make a choice based on this:

if the resource does already exists within AWS, I must specify an AdoptedResource to adopt it;
but if the resource does not exist within AWS, I must specify a new ack resource (e.g. an IAM policy resource) to create it.

This makes declarative, idempotent GitOps difficult or impossible if I do not already know the full state of the target environment. For example if I am tearing down and rebuilding environments rapidly (which I do during development), some resources may not be cleaned-up properly during teardown and so I would have to use an AdoptedResource when rebuilding. Others will have been destroyed correctly and so I would have to specify the actual ack resource (e.g. an IAM policy resource) to recreate it.

Describe the solution you'd like

I would like the option for an ack resource (e.g. an IAM policy resource) to transparently create or adopt the AWS resource in the target environment. If the AWS resource already exists, adopt it and remediate it to match the desired state. If the AWS resource does not already exist, create it.

So this is non-breaking, I suggest a new annotation for all ack resources:

annotations:
  services.k8s.aws/force_adoption: "true"

The default should be "false" (current behaviour), but when set to true, defined ack resources will automatically start managing existing AWS resources if they already exist.

Describe alternatives you've considered

Can't think of anything. I've tried everything I can thing of with the current AdoptedResources functionality, but I can't get it to do what I want.

The text was updated successfully, but these errors were encountered:

vijtrip2 · 2022-07-05T17:22:35Z

TL;DR:

Hi @liger1978, thank you for bringing this to our attention and proposing a creative solution. However I believe that if ACK deletion logic works as expected, your stack should be GitOps compliant without the need to adopt any resources.

Any resource adoption, if needed, shall be a one time thing, and then the generated ACK resource from the adoption can be managed in a GitOps fashion.

Original Problem

some resources may not be cleaned-up properly during teardown

How are you performing the cleanup? Manually deleting resources from AWS or deleting the K8s resource manifests.
If it's the later and there is a bug in ACK resource deletion do let us know.

And during teardown wouldn't you wanna make sure that all the resources were successfully deleted before recreating the stack? Why reuse old resources that were meant to be deleted?

Overall GitOps Experience

If you started with no adopted resources and were creating whole stack using ACK resources, I would rather fix the bugs in deletion logic and make sure idempotent GitOps experience is achievable that way.

Current Adopted Resource GitOps Experience

Currently If an adopted resource is present in the GitOps manifest, it will create an ACK resource that is not present in the GitOps manifests. I can imagine a 2 step process to be completely GitOps compliant. First you add AdoptedResource to the GitOps manifests which will create an ACK resource, and then after successful adoption, you replace AdoptedResource inside GitOps manifests with actual ACK resource manifest.

Once the resource is adopted, and manifests are updated, the resulting manifests will be GitOps complaint. And during tear-down + recreation, you will not need to adopt the resource again.

liger1978 · 2022-07-05T18:02:11Z

How are you performing the cleanup? Manually deleting resources from AWS or deleting the K8s resource manifests.

In some cases the managing EKS cluster is destroyed and rebuilt. This is typically the situation that causes the ack-managed resources in AWS to be orphaned in our case. When the cluster is rebuilt and processes its GitOps repo, the reinstalled ack controllers attempts to recreate the orphaned resources and fails as they already exist. There is no bug AFAIK.

And during teardown wouldn't you wanna make sure that all the resources were successfully deleted before recreating the stack. Why reuse old resources that were meant to be deleted?

Not when we tear down the managing cluster. We typically don't want to destroy the AWS resources the cluster manages with the ack controllers, just the cluster itself.

vijtrip2 · 2022-07-05T18:20:03Z

@liger1978 , gotcha! Thanks for providing more context.

liger1978 · 2022-07-05T18:45:39Z

@vijtrip2 No problem. Another scenario, this time in production instead of development, is geographic failover of our EKS management cluster.

Management cluster cluster1 in us-east-1 goes down due to a regional outage, so we quickly spin up cluster2 in us-west-1 pointed at the same GitOps repo. We would like all the existing ack resources defined in the repo to be automatically adopted when the new cluster takes over their management.

RedbackThomson · 2022-07-05T22:13:15Z

@liger1978
Thanks for bringing this use-case to our attention. This annotation strategy was discussed during the design of the AdoptedResource. We haven't entirely written it off, but there are some other issues with it that I wanted to let you know.

Not all resources can be defined using the properties in their spec. Many AWS resources use an auto-generated name (such as the EC2 instance ID), and then require that all references are made using this name or the ARN. In those cases, there is no combination of spec fields that would properly define which existing resource to adopt. The controller would treat every new K8s custom resource as a new, separate object, leaving the existing resources hanging. This was the biggest reason we went with a separate CRD for adoption, so that we didn't need to modify the spec fields for any existing resources.

Secondly, although the spec of a K8s CR should define the full desired state of a resource, most of the time a user will only provide a partial spec for an ACK resource and rely on the AWS service to fill in the defaults. That is, you probably aren't going to use every single field in every single ACK custom resource, but instead rely on the fact that the AWS service will use the default values for anything left undefined. In those cases, the ACK controller persists the server-side defaults back into the spec of the object so that the next reconciliation loop understands what the default values are - and therefore whether to attempt to override them (if modified). When you adopt an existing AWS resource using an annotation on a partially defined ACK resource, the K8s controller cannot know what the server-side default value is. Most of the time, these defaults are only returned when we create the resource for the first time, so if we submit an undefined value as part of a subsequent Modify* call to the service, it could simply return an error. Therefore, because we aren't handling the full lifecycle of every object, we can't guarantee it would match the expected configuration of one created through ACK.

I apologise for the large paragraphs of text, but I realise these nuances have not been explained anywhere else in the design documents or online documentation for the adopted resource CRDs. I would love to provide an annotation to allow resource adoption in the way you described, but I worry that these cases (admittedly they are edge cases) could end up causing a confusing user experience. That is not to say we won't ever support functionality like this - GitOps compliance is incredibly important to the project and suggestions provide important insights into how the controllers are being used

liger1978 · 2022-07-06T08:03:30Z

@liger1978 Thanks for bringing this use-case to our attention.

@RedbackThomson No problem!

For clarity, I am henceforth going to refer to my proposed process of managing existing AWS resources as "absorption" to make it distinct from your existing adoption process.

Not all resources can be defined using the properties in their spec. Many AWS resources use an auto-generated name (such as the EC2 instance ID), and then require that all references are made using this name or the ARN. In those cases, there is no combination of spec fields that would properly define which existing resource to adopt. The controller would treat every new K8s custom resource as a new, separate object, leaving the existing resources hanging.

OK, a couple options for absorption here:

Only successfully absorb by spec.name where the name is a unique identifier of an existing AWS resource. If the name does not resolve to an existing unique AWS resource, then create a new one. This will work fine for some resources, but as you have noted, will never allow absorption of resources like EC2 instances where names are generated. This will still be useful for many use cases.
Absorb by ARN. This is useful where the generated ARN is predictable and is similar to option 1, but will remove any ambiguity at all about what will be absorbed, e.g.:

metadata:
  annotations:
    services.k8s.aws/absorb_existing: "true"
    services.k8s.aws/absorb_match_arn: "arn:aws:iam::123456789012:policy/my-policy"

Absorb by specified AWS tags. If a search based on the tags resolves to a single AWS resource, then the controller has successfully found the resource to absorb. If not, then it will create a new one, e.g.:

metadata:
  annotations:
    services.k8s.aws/absorb_existing: "true"
    services.k8s.aws/absorb_match_tags: "role=bastion_server,env=dev"

Option 2 is likely the easiest to implement and has the least ambiguity about what is going to happen. It would suit us as things stand. Option 3 would cover more resources types and it is possible we will require it in future, in addition to option 2, as we manage more resource types with ack.

Secondly, although the spec of a K8s CR should define the full desired state of a resource, most of the time a user will only provide a partial spec for an ACK resource and rely on the AWS service to fill in the defaults. That is, you probably aren't going to use every single field in every single ACK custom resource, but instead rely on the fact that the AWS service will use the default values for anything left undefined. In those cases, the ACK controller persists the server-side defaults back into the spec of the object so that the next reconciliation loop understands what the default values are - and therefore whether to attempt to override them (if modified). When you adopt an existing AWS resource using an annotation on a partially defined ACK resource, the K8s controller cannot know what the server-side default value is. Most of the time, these defaults are only returned when we create the resource for the first time, so if we submit an undefined value as part of a subsequent Modify* call to the service, it could simply return an error.

I would be perfectly happy with this. The k8s resource would be in error state and hopefully the API would return the missing value which would display in the status field and the controller logs. If I intend to use absorption, it is up to me to fully specify my resource in my k8s spec. An appropriate caveat emptor can be added to the docs and as long as the error in the k8s resource status field is clear, it should be easy to see what is up and add the missing values to the spec.

I apologise for the large paragraphs of text, but I realise these nuances have not been explained anywhere else in the design documents or online documentation for the adopted resource CRDs. I would love to provide an annotation to allow resource adoption in the way you described, but I worry that these cases (admittedly they are edge cases) could end up causing a confusing user experience. That is not to say we won't ever support functionality like this - GitOps compliance is incredibly important to the project and suggestions provide important insights into how the controllers are being used

You have nothing to apologise for. I appreciate the quick and thoughtful responses from the ack team! I understand the original design decisions, but as things stand we can't really do idempotent GitOps where our target environments are remediated to match the desired state in the repo.

RedbackThomson · 2022-07-06T19:28:38Z

Actually I have been mincing this over for a while and I'm starting to come back around on this idea.

I'd say the main use-case for the AdoptedResource custom resource was to support users who were previously using other tools (CloudFormation, Terraform) without requiring them to rewrite all of their definitions with ACK. They would not be able to use your annotation-based solution - they would not be able to provide the bare minimum required fields to create the custom resource. Instead, they would apply a set of AdoptedResource objects with the names (as exported from their current tooling) and then be able to download the YAML for future reference.

However, you're offering a different situation, wherein a user already has fully-formed manifests, created elsewhere, that wants to continue reconciliation in a new context. Apart from the edge-cases I identified previously, there isn't anything fundamentally wrong with that.

vijtrip2 · 2022-07-07T21:38:34Z

Similar disaster recovery use case from EBS CSI driver: kubernetes-sigs/aws-ebs-csi-driver#1160

eks-bot · 2022-10-06T04:02:02Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle stale

eks-bot · 2022-11-05T04:02:26Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle rotten

a-hilaly · 2022-12-02T15:27:57Z

/lifecycle frozen

james-callahan · 2023-02-15T05:19:05Z

I suggest that adopting existing resources by tag is enough.

At least by default at the moment, there are tags:

services.k8s.aws/controller-version exists with the form %CONTROLLER_SERVICE%-%CONTROLLER_VERSION%
services.k8s.aws/namespace exists with the form %K8S_NAMESPACE%

If you additionally had tags with the kubernetes resource name, e.g.

services.k8s.aws/name existed with the form %K8S_RESOURCE_NAME%

Then you should have all you need to link a resource back to it's recreated kubernetes form.

tomitesh · 2023-05-25T08:10:18Z

have you got any update on this issue?

nicraMarcin · 2024-09-09T19:35:14Z

Hi, any update?

michaelhtm · 2024-12-19T18:12:21Z

Addressed this issue in runtime
/close

ack-prow · 2024-12-19T18:12:26Z

@michaelhtm: Closing this issue.

In response to this:

Addressed this issue in runtime
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

liger1978 added the kind/enhancement Categorizes issue or PR as related to existing feature enhancements. label Jul 4, 2022

vijtrip2 mentioned this issue Jul 26, 2022

ACK IAM :: Blue Green deployment #1403

Open

ack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 6, 2022

RedbackThomson mentioned this issue Oct 6, 2022

Automatic adoption of ACK created resources #1481

Closed

ack-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 5, 2022

ack-bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Dec 2, 2022

dbones mentioned this issue May 25, 2023

Adopting existing AWS assets (desired state and gitops) #1809

Closed

RedbackThomson mentioned this issue Aug 15, 2023

Provide a good adoption procedure #1862

Closed

gecube mentioned this issue Feb 5, 2024

Read-only Resources Feature Request #2008

Closed

gecube mentioned this issue Sep 9, 2024

Adopt already generated resource #2165

Closed

twang418 mentioned this issue Oct 10, 2024

AMP controller keeps creating new workspace when restoring from Velero backup #2105

Closed

ack-prow bot closed this as completed Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transparent resource adoption #1381

Transparent resource adoption #1381

liger1978 commented Jul 4, 2022

vijtrip2 commented Jul 5, 2022 •

edited

Loading

Uh oh!

liger1978 commented Jul 5, 2022 •

edited

Loading

Uh oh!

vijtrip2 commented Jul 5, 2022

Uh oh!

liger1978 commented Jul 5, 2022 •

edited

Loading

Uh oh!

RedbackThomson commented Jul 5, 2022 •

edited

Loading

Uh oh!

liger1978 commented Jul 6, 2022

Uh oh!

RedbackThomson commented Jul 6, 2022 •

edited

Loading

Uh oh!

vijtrip2 commented Jul 7, 2022

Uh oh!

eks-bot commented Oct 6, 2022

Uh oh!

eks-bot commented Nov 5, 2022

Uh oh!

a-hilaly commented Dec 2, 2022

Uh oh!

james-callahan commented Feb 15, 2023 •

edited

Loading

Uh oh!

tomitesh commented May 25, 2023

Uh oh!

nicraMarcin commented Sep 9, 2024

Uh oh!

michaelhtm commented Dec 19, 2024

Uh oh!

ack-prow bot commented Dec 19, 2024

Uh oh!

Transparent resource adoption #1381

Transparent resource adoption #1381

Comments

liger1978 commented Jul 4, 2022

vijtrip2 commented Jul 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liger1978 commented Jul 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vijtrip2 commented Jul 5, 2022

Uh oh!

liger1978 commented Jul 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RedbackThomson commented Jul 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liger1978 commented Jul 6, 2022

Uh oh!

RedbackThomson commented Jul 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vijtrip2 commented Jul 7, 2022

Uh oh!

eks-bot commented Oct 6, 2022

Uh oh!

eks-bot commented Nov 5, 2022

Uh oh!

a-hilaly commented Dec 2, 2022

Uh oh!

james-callahan commented Feb 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomitesh commented May 25, 2023

Uh oh!

nicraMarcin commented Sep 9, 2024

Uh oh!

michaelhtm commented Dec 19, 2024

Uh oh!

ack-prow bot commented Dec 19, 2024

Uh oh!

vijtrip2 commented Jul 5, 2022 •

edited

Loading

liger1978 commented Jul 5, 2022 •

edited

Loading

liger1978 commented Jul 5, 2022 •

edited

Loading

RedbackThomson commented Jul 5, 2022 •

edited

Loading

RedbackThomson commented Jul 6, 2022 •

edited

Loading

james-callahan commented Feb 15, 2023 •

edited

Loading