Skip to content

🐛 improve poor performance of helm chart conversion #1050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

joelanford
Copy link
Member

@joelanford joelanford commented Jul 15, 2024

Description

A few ways that performance is improved:

  1. Remove the unnecessary storage abstraction. We can convert directly from the unpacked bundle FS.
  2. Use standard tooling from cli-runtime to parse manifests from the registry+v1 bundle (e.g. don't read file into buffer and then parse the buffer)
  3. Don't write to a plain FS and then turn around and immediately re-parse the plain FS. Skip the plain FS and generate the chart directly.
  4. Use JSON encoding rather than YAML encoding. The YAML encoding traverses JSON anyway, so this just removes an unnecessary step. Also, the Helm client has a performance benefit later by not having to convert YAML to JSON when reading the chart we generate.

Other ancillary changes:

  1. Put unpacked catalogs in a separate <cacheDir>/catalogs directory (rather than directly in <cacheDir>).
  2. Add debug logging and nanosecond-level granularity in log lines.
  3. Remove a bunch of unused code from rukpak (handlers, provisioners, storage implementations)

Motivation

Performance improvements of the reconcile function, and thus a more reactive system.

Locally, I noticed e2e drops from 323s to 231s.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@joelanford joelanford requested a review from a team as a code owner July 15, 2024 15:07
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 15, 2024
Copy link

netlify bot commented Jul 15, 2024

👷 Deploy Preview for olmv1 processing.

Name Link
🔨 Latest commit eb5f63f
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/66953b4dab6d1a00072d74f5

Copy link

netlify bot commented Jul 15, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 67352d1
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/66953f904f9d310008a7a462
😎 Deploy Preview https://deploy-preview-1050--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@joelanford joelanford force-pushed the faster-helm-chart-generation branch from eb5f63f to 5bd1edd Compare July 15, 2024 15:09
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 15, 2024
@tmshort
Copy link
Contributor

tmshort commented Jul 15, 2024

And it already needs a rebase...

@joelanford joelanford force-pushed the faster-helm-chart-generation branch from 5bd1edd to 67352d1 Compare July 15, 2024 15:26
Copy link

codecov bot commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 68.85246% with 19 lines in your changes missing coverage. Please review.

Project coverage is 72.98%. Comparing base (ee7c35a) to head (67352d1).
Report is 3 commits behind head on main.

Files Patch % Lines
internal/rukpak/convert/registryv1.go 60.97% 11 Missing and 5 partials ⚠️
cmd/manager/main.go 50.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1050      +/-   ##
==========================================
- Coverage   73.46%   72.98%   -0.48%     
==========================================
  Files          32       28       -4     
  Lines        1986     1877     -109     
==========================================
- Hits         1459     1370      -89     
- Misses        368      375       +7     
+ Partials      159      132      -27     
Flag Coverage Δ
e2e 54.33% <68.85%> (-1.52%) ⬇️
unit 45.92% <9.83%> (+0.90%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions.

@@ -161,7 +158,12 @@ func main() {
}

cl := mgr.GetClient()
catalogClient := catalogclient.New(cl, cache.NewFilesystemCache(cachePath, httpClient))
catalogsCachePath := filepath.Join(cachePath, "catalogs")
if err := os.MkdirAll(catalogsCachePath, 0700); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing that the previous mode (for bundles) was 0755, does this make a difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't. We don't expect any other users or groups to read these files.

@@ -212,6 +212,9 @@ func checkForUnexpectedFieldChange(a, b ocv1alpha1.ClusterExtension) bool {
*/
//nolint:unparam
func (r *ClusterExtensionReconciler) reconcile(ctx context.Context, ext *ocv1alpha1.ClusterExtension) (ctrl.Result, error) {
l := log.FromContext(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to put the name of the ClusterExtension here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, also attach the V(1) to this. e.g.

Suggested change
l := log.FromContext(ctx)
l := log.FromContext(ctx).V(1).AddValues("ClusterExtension", ext.GetName())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is already part of the context provided in the context that arrives from the caller of Reconcile.

I thought about using V(1), but that would mean that no logging (with that logger at least) could ever happen at the INFO level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest that I could do something like this:

l := log.FromContext(ctx)
debugLog := l.V(1)

But that would be misleading because levels build (i.e. .V(1).V(1) is equivalent to .V(2)), so if the caller set a .V(1) logger in the context, the this would suddenly not be a debug logger.

},
hash := sha256.Sum256(jsonData)
chrt.Templates = append(chrt.Templates, &chart.File{
Name: fmt.Sprintf("object-%x.json", hash[0:8]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this enough of the hash? Should it be longer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make it longer, but I didn't change that aspect of this code. The name is the same as before except for .json instead of .yaml.

return nil, fmt.Errorf("read %q: %v", e.Name(), err)
defer func() {
if err := manifestFile.Close(); err != nil {
l.Error(err, "error closing file", "path", path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the only place l is used. Should l := log.FromContext(ctx) be moved down here for locality?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or even into the WalkDir function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did that, we'd have to repeatedly ask the context for the same logger over and over. It feels better (to me at least) to get the "logger to be used in the scope of this function" once, right at the very beginning of the function.

return nil, err
}
for _, e := range entries {
if err := fs.WalkDir(rv1, manifestsDir, func(path string, e fs.DirEntry, err error) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another potential optimization here, where we spin up runtime.NumCPU goroutines and parse the files concurrently. But that can happen as a follow-up.

Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 15, 2024
@tmshort tmshort added this pull request to the merge queue Jul 15, 2024
Merged via the queue into operator-framework:main with commit bfc65ee Jul 15, 2024
16 of 18 checks passed
@joelanford joelanford deleted the faster-helm-chart-generation branch July 15, 2024 17:27
perdasilva pushed a commit to LalatenduMohanty/operator-controller that referenced this pull request Aug 13, 2024
perdasilva pushed a commit to kevinrizza/operator-controller that referenced this pull request Aug 13, 2024
@skattoju skattoju mentioned this pull request Sep 25, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants