Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Track git provider API usage metrics #2005

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

aThorp96
Copy link
Contributor

@aThorp96 aThorp96 commented Mar 18, 2025

Changes

Every time we access a Git provider's API we may use some of that
provider's API limits. Medium-sized providers may have many repositories
using PaC but may not have the computer overhead to support permissive
API rate limits. In some cases we have seen pipelines delayed due to API
rate limiting.

By tracking the Git providers' API load by PaC we can gain insight into
potential redundancies, hot spots, and optimizations to the Git provider
API use.

Since the Git provider API clients do not expose these metrics and are
not designed in a way that is easy to decorate, the closest proxy to
tracking actual API calls is tracking accesses to the API client itself.
Since one client-method call equates to one API call, this metric should
track the API usage well. However if a developer saves a reference to the
API client then they will be able to make API calls without the metric
incrementing; we will have to watch out for this if we want the metric
to be accurate.

issue: #1925

Submitter Checklist

  • 📝 Ensure your commit message is clear and informative. Refer to the How to write a git commit message guide. Include the commit message in the PR body rather than linking to an external site (e.g., Jira ticket).

  • ♽ Run make test lint before submitting a PR to avoid unnecessary CI processing. Consider installing pre-commit and running pre-commit install in the repository root for an efficient workflow.

  • ✨ We use linters to maintain clean and consistent code. Run make lint before submitting a PR. Some linters offer a --fix mode, executable with make fix-linters (ensure markdownlint and golangci-lint are installed).

  • 📖 Document any user-facing features or changes in behavior.

  • 🧪 While 100% coverage isn't required, we encourage unit tests for code changes where possible.

  • 🎁 If feasible, add an end-to-end test. See README for details.

  • 🔎 Address any CI test flakiness before merging, or provide a valid reason to bypass it (e.g., token rate limitations).

  • If adding a provider feature, fill in the following details:

Git Provider Supported
GitHub App ✅️
GitHub Webhook ✅️
Gitea ✅️
GitLab ✅️
Bitbucket Cloud ✅️
Bitbucket Data Center ✅️

(update the documentation accordingly)

@aThorp96 aThorp96 changed the title Git api usage metrics feat: Track git provider API usage metrics Mar 18, 2025
@aThorp96 aThorp96 force-pushed the git_api_usage_metrics branch 2 times, most recently from 35c8495 to a45829a Compare March 18, 2025 15:55
@aThorp96 aThorp96 marked this pull request as ready for review March 18, 2025 16:58
@aThorp96 aThorp96 requested review from chmouel and zakisk March 18, 2025 16:58
@zakisk
Copy link
Contributor

zakisk commented Mar 19, 2025

@aThorp96 have you checked that controller is emitting metrics?, I checked and it doesn't. and also controller and watcher both call git providers APIs, so both are emitting metrics so it needs for users to configure metrics for both controller and watcher and that is big change. and if you configure metrics for both, sum of both is the number of API calls on a git event 🙁.
cc: @chmouel

Screenshot from 2025-03-19 14-51-19

@zakisk
Copy link
Contributor

zakisk commented Mar 19, 2025

@aThorp96 also controller is a "knative adapter" and watcher is "knative reconciler" so I am not sure whether it is possible to emit metrics from adapter or not (it could be a topic for you)

@zakisk
Copy link
Contributor

zakisk commented Mar 20, 2025

@aThorp96 yeah, I saw metrics emitted from controller, I was doing it before in wrong way.

@zakisk
Copy link
Contributor

zakisk commented Mar 20, 2025

@aThorp96 have you checked that controller is emitting metrics?, I checked and it doesn't. and also controller and watcher both call git providers APIs, so both are emitting metrics so it needs for users to configure metrics for both controller and watcher and that is big change. and if you configure metrics for both, sum of both is the number of API calls on a git event 🙁. cc: @chmouel

@aThorp96 but still there is question about two metrics services, (watcher, controller)?

return err
}

ctx, err := tag.New(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aThorp96 at the moment metrics have event tag but user may want to filter out metrics based on event SHA, so it would be better to add it.
Screenshot from 2025-03-20 12-43-33

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Event sha could be useful for debugging. I worry about the cardinality of tags though if we introduce the SHA as a tag. WDYT?

Copy link
Contributor

@zakisk zakisk Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, cardinality could be issue but there is no other way to differentiate that how many API calls were there on an event. for example it has eventType tag but there could be many push events so API request count in this metric will be cumulative in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I think for that use-case may be better suited for tracing, replaying an event, using a unit or e2e test.

The way I see this being used is to first identify particularly heavy event types and repositories. I think that is useful for an SRE alone, but if an SRE wanted to know how many API calls were made for a given event, I think there are more appropriate tools/techniques than metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I think for that use-case may be better suited for tracing, replaying an event, using a unit or e2e test.

I don't think that user may want to run e2e tests in production 🙂

Requiring the client object be accessed via an accessor method allows
for hooking into API usage for things like API metrics, since the
github-client library API is designed in a way difficult to decorate.
Every time we access a Git provider's API we may use some of that
provider's API limits. Medium-sized providers may have many repositories
using PaC but may not have the computer overhead to support permissive
API rate limits. In some cases we have seen pipelines delayed due to API
rate limiting.

By tracking the Git providers' API load by PaC we can gain insight into
potential redundancies, hot spots, and optimizations to the Git provider
API use.

Since the Git provider API clients do not expose these metrics and are
not designed in a way that is easy to decorate, the closest proxy to
tracking actual API calls is tracking accesses to the API client itself.
Since one client-method call equates to one API call, this metric should
track the API usage well. However if a developer saves a reference to the
API client then they will be able to make API calls without the metric
incrementing; we will have to watch out for this if we want the metric
to be accurate.
@aThorp96
Copy link
Contributor Author

@zakisk per our discussion, I added some documentation regarding the git-provider metrics being emitted from both services. Here is a screenshot of one of the example queries I included, taken after running some of our end to end tests
20250321_16h26m41s_grim

@osp-pac osp-pac added the e2e label Mar 26, 2025
@chmouel
Copy link
Member

chmouel commented Mar 26, 2025

/retest

Description: gitProviderAPIRequestCount.Description(),
Measure: gitProviderAPIRequestCount,
Aggregation: view.Count(),
TagKeys: []tag.Key{R.provider, R.eventType, R.namespace, R.repository},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that give the ability to track each provider separately ? i mean is the admin can get insight how many calls has been done for gitlab and for github app when both are used on the same cluster

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chmouel Yes, this metric can be filtered using provider tag.

Copy link
Contributor Author

@aThorp96 aThorp96 Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Additionally the provider tag for Gitlab and Gitea will use the API URL so you can see insight per provider-instance. Similarly Github will use either github or github-enterprise respectively

@aThorp96 aThorp96 requested review from chmouel and zakisk March 28, 2025 12:02
return v.bbClient
}

func (v *Provider) recordAPIUsageMetrics() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aThorp96 this is implemented for every provider and seems repeated code, can you find a way for a common logic, wdyt?

Comment on lines +44 to +51
if v.metrics == nil {
m, err := metrics.NewRecorder()
if err != nil {
v.Logger.Errorf("Error initializing bitbucketcloud metrics recorder: %v", err)
return
}
v.metrics = m
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aThorp96 like doing this in a new method on provider interface e.g. GetMetrics and the using that method for emitting metrics from a single common function.

namespace = v.repo.Namespace
}

if err := v.metrics.ReportGitProviderAPIUsage("bitbucketcloud", v.triggerEvent, namespace, name); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aThorp96 for provider name, you've GetConfig function implemented for every provider and it will persist uniformity of naming convention as well like see here bitbucket-cloud has hyphen. same for above error message as well (line no 47 in this file)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

4 participants