Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(grpc): Add startupProbe to check for grpc health readiness #2791

Merged

Conversation

dinhxuanvu
Copy link
Member

Currently, liveness and readiness probes may fail due to grpc is not
ready. Adding a startupProbe will ensure grpc is ready before
liveness and readiness probes are triggered.

Signed-off-by: Vu Dinh [email protected]

Description of the change:

Motivation for the change:

Architectural changes:

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@dinhxuanvu dinhxuanvu requested a review from joelanford June 2, 2022 18:41
@openshift-ci openshift-ci bot requested a review from perdasilva June 2, 2022 18:41
@openshift-ci
Copy link

openshift-ci bot commented Jun 2, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dinhxuanvu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2022
Currently, liveness and readiness probes may fail due to grpc is not
ready. Adding a startupProbe will ensure grpc is ready before
liveness and readiness probes are triggered.

Signed-off-by: Vu Dinh <[email protected]>
@grokspawn
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2022
@openshift-merge-robot openshift-merge-robot merged commit ba59fd0 into operator-framework:master Jun 2, 2022
@grokspawn
Copy link
Contributor

grokspawn commented Jun 16, 2022

Several technical options were discussed to attempt to resolve this issue:

  1. make GRPC serving and FBC reading disjoint, where the endpoint would return a "not ready" until the FBC load is complete (for e.g. '202 -- Request Accepted').
  2. as type: image preparation, pre-process FBC into a binary blob to change FBC service as a linear load to constant-time.
  3. add a startupProbe of sufficient length to allow the framework to understand that the CatalogSource pod takes more time to get initially ready.

3 was the fastest to implement while we continue to work through the problem domain to see if further effort is needed to either resolve some additional problem or simplify the catalog release issues which might arise from the selected approach.

@grokspawn
Copy link
Contributor

Just in case anyone is landing here, looking for what happened next, the direction is here

@alfieyfc
Copy link

@grokspawn I see this fix was after the latest release v0.21.2. Is there a way to install olm with this fix now? Do we have to wait until operator-framework/operator-registry#977 to be merged for this to work? Or, is there a page for users to check when the next release is scheduled? 🙏

@dinhxuanvu
Copy link
Member Author

@grokspawn I see this fix was after the latest release v0.21.2. Is there a way to install olm with this fix now? Do we have to wait until operator-framework/operator-registry#977 to be merged for this to work? Or, is there a page for users to check when the next release is scheduled? 🙏

We will release a new version of OLM soon. Sometime this week I believe. Stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants