Skip to content

Certificate handling improvements for preview environments #9686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 2, 2022

Conversation

mads-hartmann
Copy link
Contributor

@mads-hartmann mads-hartmann commented May 2, 2022

Description

We have seen sporadic problem when issuing certificate for our preview environments. It appears to be a problem with CertManager and that upgrading to a newer version might reduce the chances of getting into this problem, but from reports it doesn't seem to eliminate it completely.

This PR:

  1. Makes it easier for developers to know how to proceed when certificates fail. We delete the certificate and instruct them to re-run their job
  2. Helps the platform team more proactively debug and fix these issues by tagging us in a Slack message and pointing us to a new internal guide we can improve over time.

Additionally the PR improves a few things

  1. Use werft.fail instead of throwing errors directly. This improves the werft logs and the underlying trace while still resulting in an error being thrown.
  2. Removing superfluous try/catch/throw error code
  3. Fixes some (I believe) unwanted parallelism - see comments directly in the code.

Related Issue(s)

Fixes https://github.com/gitpod-io/ops/issues/2066

How to test

I haven't been able to validate the error-handling code for when the certificates fail due to the nature of the issues.

Release Notes

NONE

Documentation

N/A

@@ -142,7 +142,7 @@ export async function deployToPreviewEnvironment(werft: Werft, jobConfig: JobCon

try {
werft.log(vmSlices.COPY_CERT_MANAGER_RESOURCES, 'Copy over CertManager resources from core-dev')
installMetaCertificates(werft, jobConfig.repository.branch, withVM, 'default', PREVIEW_K3S_KUBECONFIG_PATH, vmSlices.COPY_CERT_MANAGER_RESOURCES)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we want this to be blocking and not fire-and-forget. The certificates has previously been issues earlier in the job, so this is waiting for them to be ready and then copying them over.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -24,7 +23,7 @@ export async function prepare(werft: Werft, config: JobConfig) {
configureStaticClustersAccess()
werft.done(prepareSlices.CONFIGURE_CORE_DEV)

issueCertificate(werft, config)
await issueCertificate(werft, config)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just creating the Certificate CRD resources that CertManager uses. I don't believe we want to do so as a "fire-and-forget" promise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed - assuming we only create the CR and don't actually wait for the cert to be issued.

const certName = config.withVM ? `harvester-${previewNameFromBranchName(config.repository.branch)}` : `staging-${previewNameFromBranchName(config.repository.branch)}`
const domain = config.withVM ? `${config.previewEnvironment.destname}.preview.gitpod-dev.com` : `${config.previewEnvironment.destname}.staging.gitpod-dev.com`;

werft.log(prepareSlices.ISSUE_CERTIFICATES, prepareSlices.ISSUE_CERTIFICATES)
issueMetaCerts(werft, certName, "certs", domain, config.withVM, prepareSlices.ISSUE_CERTIFICATES)
await issueMetaCerts(werft, certName, "certs", domain, config.withVM, prepareSlices.ISSUE_CERTIFICATES)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the comment above, this is just creating the Certificate, no need to use "fire-and-forget"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mads-hartmann mads-hartmann marked this pull request as ready for review May 2, 2022 11:44
@mads-hartmann mads-hartmann requested a review from a team May 2, 2022 11:44
Copy link
Member

@meysholdt meysholdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM

@roboquat roboquat merged commit 1b3b0a8 into main May 2, 2022
@roboquat roboquat deleted the mads/cert-issuing branch May 2, 2022 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants