Description
What happened:
Tracking issue for https://kubernetes.slack.com/archives/CJH2GBF7Y/p1604669572198400.
Noticed in kubernetes/test-infra#19483.
Our attempts to move the ci-kubernetes-build
to Community Infra are failing because container images are not successfully getting pushed.
Comment from @ameukam (kubernetes/test-infra#19483 (comment)):
do this via adding the service account e-mail address to the [email protected] group?
ci-kubernetes-build-canary
still fails even after the service account is added (see kubernetes/k8s.io#1393) to [email protected] : https://testgrid.k8s.io/sig-testing-canaries#build-master-canary
prow-build
service account inherits of the permissions of the roleroles/cloudbuild.builds.editor
as member of [email protected] :which is not enough to make the job successful.
That's a credential issue that needs to be fixed in parallel.
This issue is specifically for some of my expectations around push-build.sh behavior.
What you expected to happen:
- Any build jobs should verify access to the container image registry before proceeding
This is a fail-fast scenario.
If we know that a build is supposed to push GCR images, we should check that we're able to do that first, instead of build artifacts and waiting for the container push failure at the end of the scenario.
- The check for the existence of a build only checks for GCS bucket artifacts, not container images
In scenarios/kubernetes_build.py
If we consider a "complete" build to also include container images, this check should verify that those exist as well before claiming a build is not required.
- A build should not push artifacts if it cannot guarantee that all of them will be available
The current push-build.sh logic:
Lines 867 to 918 in 4c6b5aa
Here, we should probably attempt to publish artifacts in the following order:
- container images
- GCS artifacts
- version marker
That way, if images fail to push, then the build job fails before copying to GCS.
If there's nothing in the bucket, then the check in #1 will cause a new build to always be attempted.
@hasheddan -- I'll leave you to divide up the work as appropriate.
/assign @hasheddan @ameukam @cpanato
cc: @kubernetes/release-engineering @spiffxp
/priority critical-urgent
How to reproduce it (as minimally and precisely as possible):
See kubernetes/test-infra#19483.
Anything else we need to know?:
Environment:
- Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): - Others: