Skip to content

[release-2.6] 🐛: elbv2: wait for LB active state instead of resolving DNS name #5226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

k8s-infra-cherrypick-robot

This is an automated cherry-pick of #5093

/assign nrb

Check for the LB "active" status instead of trying to resolve the DNS name to validate the LB is ready.

Using DNS name resolution as a way to check the load balancer is working
can cause problems that are dependent on the host running CAPA. In some
systems, the DNS resolution can fail with very large TTLs cached DNS
responses, causing very long provisioning times.

Instead of DNS resolution, let's use the AWS API to check for the load
balancer "active" state. Waiting for resolvable DNS names should be left
for the clients to do.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 22, 2024
@k8s-ci-robot k8s-ci-robot requested review from damdo and faiq November 22, 2024 14:31
@k8s-ci-robot k8s-ci-robot added needs-priority size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 22, 2024
Copy link
Contributor

@AndiDog AndiDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 23, 2024
@damdo
Copy link
Member

damdo commented Nov 23, 2024

/test pull-cluster-api-provider-aws-e2e

@k8s-ci-robot
Copy link
Contributor

@damdo: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-aws-build
  • /test pull-cluster-api-provider-aws-build-docker
  • /test pull-cluster-api-provider-aws-build-docker-release-2-6
  • /test pull-cluster-api-provider-aws-build-release-2-6
  • /test pull-cluster-api-provider-aws-test-release-2-6
  • /test pull-cluster-api-provider-aws-verify-release-2-6

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-aws-apidiff-release-2-6
  • /test pull-cluster-api-provider-aws-e2e-blocking-release-2-6
  • /test pull-cluster-api-provider-aws-e2e-conformance-release-2-6
  • /test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts-release-2-6
  • /test pull-cluster-api-provider-aws-e2e-eks-gc-release-2-6
  • /test pull-cluster-api-provider-aws-e2e-eks-release-2-6
  • /test pull-cluster-api-provider-aws-e2e-eks-testing-release-2-6
  • /test pull-cluster-api-provider-aws-e2e-release-2-6

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-aws-apidiff-release-2-6
  • pull-cluster-api-provider-aws-build
  • pull-cluster-api-provider-aws-build-docker
  • pull-cluster-api-provider-aws-build-docker-release-2-6
  • pull-cluster-api-provider-aws-build-release-2-6
  • pull-cluster-api-provider-aws-test-release-2-6
  • pull-cluster-api-provider-aws-verify-release-2-6

In response to this:

/test pull-cluster-api-provider-aws-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@r4f4
Copy link
Contributor

r4f4 commented Nov 25, 2024

/test pull-cluster-api-provider-aws-e2e-release-2-6

1 similar comment
@Ankitasw
Copy link
Member

/test pull-cluster-api-provider-aws-e2e-release-2-6

@r4f4
Copy link
Contributor

r4f4 commented Dec 5, 2024

STEP: Searching for AMI: name=capa-ami-ubuntu-18.04-v1.26.6* @ 11/26/24 12:50:42.09

@richardcase it seems the deprecated AMIs issue needs to be solved for other branches as well?

@r4f4
Copy link
Contributor

r4f4 commented Mar 3, 2025

/test pull-cluster-api-provider-aws-e2e-release-2-6

@patrickdillon
Copy link

Looking at pull-cluster-api-provider-aws-e2e-release-2-6 failure:


  �[38;5;9m[FAILED] Expected
      <int>: 0
  not to be zero-valued�[0m
  �[38;5;9mIn �[1m[SynchronizedBeforeSuite]�[0m�[38;5;9m at: �[1m/home/prow/go/src/sigs.k8s.io/cluster-api-provider-aws/test/e2e/shared/aws.go:868�[0m �[38;5;243m@ 03/03/25 08:06:32.008�[0m

  �[38;5;9mFull Stack Trace�[0m
    sigs.k8s.io/cluster-api-provider-aws/v2/test/e2e/shared.conformanceImageID(0xc000966580)
    	/home/prow/go/src/sigs.k8s.io/cluster-api-provider-aws/test/e2e/shared/aws.go:868 +0x47

It looks like the failure may correspond to missing creds?

return &iam.AccessKey{
AccessKeyId: out.AccessKey.AccessKeyId,
SecretAccessKey: out.AccessKey.SecretAccessKey,
}

@nrb
Copy link
Contributor

nrb commented Mar 3, 2025

@patrickdillon That is a flake that happens sometimes, we haven't nailed it down.

/test pull-cluster-api-provider-aws-e2e-release-2-6
/approve

/assign @damdo

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndiDog, nrb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@damdo
Copy link
Member

damdo commented Mar 4, 2025

/test pull-cluster-api-provider-aws-e2e-release-2-6

@damdo
Copy link
Member

damdo commented Mar 4, 2025

The e2e tests are failing because the capa-ami-ubuntu-18.04-v1.26.6 AMI doesn't exist anymore as it has been cleaned up.

@richardcase what do we suggest here? Switch to 22.04 also for release-2.6 or republish old images for this?

We might also need a better story around automated cleanup, we don't really want to not be able to run tests on older releases anymore.

@richardcase
Copy link
Member

I'd say we need to support tests on previous release branches for a short while. May be 3 release branches?

We could update the tests to use newer versions of k8s/ubuntu? But how much effort should we but into fixing e2e on an old branch?

@damdo
Copy link
Member

damdo commented Mar 4, 2025

@richardcase Yeah I agree with the current CAPA staffing we might not be able to keep up with tests on releases too far back. This is n-1 w.r.t. the current n (latest), which is 2.7, so I think it would be well within reasonable ranges that we can define.

@richardcase
Copy link
Member

@richardcase Yeah I agree with the current CAPA staffing we might not be able to keep up with tests on releases too far back. This is n-1 w.r.t. the current n (latest), which is 2.7, so I think it would be well within reasonable ranges that we can define.

Shall we just remove the e2e tests from all branches except:

  • main
  • release-2.7

Then i think when 2.8 is released we can keep it as:

  • main
  • release-2.7
  • release-2.8

I agree with you, with the number of active contributors to CAPA we need to pick where we put our effort.

Perhaps we can jump on a call with @nrb and we can come to a consensus.

@nrb
Copy link
Contributor

nrb commented Mar 4, 2025

I'm +1 to dropping the 2.6 tests. While it would be nice to have an updated, LTS OS for older branches, I don't think it's a valuable use of time or effort to keep these branches testing right now.

@patrickdillon
Copy link

@nrb @damdo are you ok moving forward with this or do we need to do anything else?

@damdo
Copy link
Member

damdo commented Mar 7, 2025

Yeah I agree with the current CAPA staffing we might not be able to keep up with tests on releases too far back. This is n-1 w.r.t. the current n (latest), which is 2.7, so I think it would be well within reasonable ranges that we can define.

Shall we just remove the e2e tests from all branches except:
main
release-2.7
Then i think when 2.8 is released we can keep it as:
main
release-2.7
release-2.8
I agree with you, with the number of active contributors to CAPA we need to pick where we put our effort.
Perhaps we can jump on a call with @nrb and we can come to a consensus.

Yes not an ideal situation but we need to be pragmatic with it, we can't maintain CI for older releases without more staffing.
As such I am happy with @richardcase proposal and considering @nrb's happy too let's do this.
We can always revisit this decision in the future if significant parties/contributions arise in this area.

@damdo
Copy link
Member

damdo commented Mar 7, 2025

I guess as far as this PR now concerned we can LGTM and merge as is.

@damdo
Copy link
Member

damdo commented Mar 7, 2025

Considering the above conversation and having @nrb @richardcase and @fiunchinho's green light, I'll proceed with merging this.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 7, 2025
@damdo
Copy link
Member

damdo commented Mar 7, 2025

/test pull-cluster-api-provider-aws-test-release-2-6

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 7, 2025

@k8s-infra-cherrypick-robot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-e2e-release-2-6 a38f759 link false /test pull-cluster-api-provider-aws-e2e-release-2-6

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@r4f4
Copy link
Contributor

r4f4 commented Mar 7, 2025

/test pull-cluster-api-provider-aws-test-release-2-6

@damdo
Copy link
Member

damdo commented Mar 7, 2025

@r4f4 it might need rebasing

@k8s-ci-robot k8s-ci-robot merged commit 2361956 into kubernetes-sigs:release-2.6 Mar 7, 2025
19 of 20 checks passed
@r4f4
Copy link
Contributor

r4f4 commented Mar 7, 2025

It merged! Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants