Skip to content

push-build.sh container image pushes should precede staging GCS artifacts and writing version markers #1693

Closed
@justaugustus

Description

@justaugustus

What happened:

Tracking issue for https://kubernetes.slack.com/archives/CJH2GBF7Y/p1604669572198400.
Noticed in kubernetes/test-infra#19483.

Our attempts to move the ci-kubernetes-build to Community Infra are failing because container images are not successfully getting pushed.

Comment from @ameukam (kubernetes/test-infra#19483 (comment)):

do this via adding the service account e-mail address to the [email protected] group?

ci-kubernetes-build-canary still fails even after the service account is added (see kubernetes/k8s.io#1393) to [email protected] : https://testgrid.k8s.io/sig-testing-canaries#build-master-canary

prow-build service account inherits of the permissions of the role roles/cloudbuild.builds.editor as member of [email protected] :

https://github.com/kubernetes/k8s.io/blob/74bfdc5741bdde3b8f489bdd8327474101b3b5e4/infra/gcp/lib.sh#L209-L231

which is not enough to make the job successful.

That's a credential issue that needs to be fixed in parallel.

This issue is specifically for some of my expectations around push-build.sh behavior.

What you expected to happen:

  1. Any build jobs should verify access to the container image registry before proceeding

This is a fail-fast scenario.
If we know that a build is supposed to push GCR images, we should check that we're able to do that first, instead of build artifacts and waiting for the container push failure at the end of the scenario.

  1. The check for the existence of a build only checks for GCS bucket artifacts, not container images

In scenarios/kubernetes_build.py

https://github.com/kubernetes/test-infra/blob/329444781ba13be597917343cca4aa1b92366b6d/scenarios/kubernetes_build.py#L45-L84

If we consider a "complete" build to also include container images, this check should verify that those exist as well before claiming a build is not required.

  1. A build should not push artifacts if it cannot guarantee that all of them will be available

The current push-build.sh logic:

release/push-build.sh

Lines 867 to 918 in 4c6b5aa

##############################################################################
common::stepheader COPY RELEASE ARTIFACTS
##############################################################################
attempt=0
while ((attempt<max_attempts)); do
if $USE_BAZEL; then
release::gcs::bazel_push_build $GCS_DEST $LATEST $KUBE_ROOT/_output \
$RELEASE_BUCKET && break
else
release::gcs::locally_stage_release_artifacts $LATEST \
$KUBE_ROOT/_output \
$FLAGS_release_kind
if ((FLAGS_fast)); then
BUILD_DEST="$GCS_DEST/fast"
else
BUILD_DEST="$GCS_DEST"
fi
release::gcs::push_release_artifacts \
$KUBE_ROOT/_output/gcs-stage/$LATEST \
gs://$RELEASE_BUCKET/$BUILD_DEST/$LATEST && break
fi
((attempt++))
done
((attempt>=max_attempts)) && common::exit 1 "Exiting..."
if [[ -n "${FLAGS_docker_registry:-}" ]]; then
##############################################################################
common::stepheader PUSH DOCKER IMAGES
##############################################################################
# TODO: support Bazel too
# Docker tags cannot contain '+'
release::docker::release $FLAGS_docker_registry ${LATEST/+/_} \
$KUBE_ROOT/_output
fi
# If not --ci, then we're done here.
((FLAGS_ci)) || common::exit 0 "Exiting..."
if ! ((FLAGS_noupdatelatest)); then
##############################################################################
common::stepheader UPLOAD to $RELEASE_BUCKET
##############################################################################
attempt=0
while ((attempt<max_attempts)); do
release::gcs::publish_version $GCS_DEST $LATEST $KUBE_ROOT/_output \
$RELEASE_BUCKET $GCS_EXTRA_VERSION_MARKERS && break
((attempt++))
done
((attempt>=max_attempts)) && common::exit 1 "Exiting..."
fi

Here, we should probably attempt to publish artifacts in the following order:

  1. container images
  2. GCS artifacts
  3. version marker

That way, if images fail to push, then the build job fails before copying to GCS.
If there's nothing in the bucket, then the check in #1 will cause a new build to always be attempted.

@hasheddan -- I'll leave you to divide up the work as appropriate.

/assign @hasheddan @ameukam @cpanato
cc: @kubernetes/release-engineering @spiffxp
/priority critical-urgent

How to reproduce it (as minimally and precisely as possible):

See kubernetes/test-infra#19483.

Anything else we need to know?:

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Others:

Metadata

Metadata

Labels

area/release-engIssues or PRs related to the Release Engineering subprojectkind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.priority/critical-urgentHighest priority. Must be actively worked on as someone's top priority right now.sig/releaseCategorizes an issue or PR as relevant to SIG Release.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions