-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Track git provider API usage metrics #2005
base: main
Are you sure you want to change the base?
Conversation
35c8495
to
a45829a
Compare
@aThorp96 have you checked that controller is emitting metrics?, I checked and it doesn't. and also controller and watcher both call git providers APIs, so both are emitting metrics so it needs for users to configure metrics for both controller and watcher and that is big change. and if you configure metrics for both, sum of both is the number of API calls on a git event 🙁. |
@aThorp96 also controller is a "knative adapter" and watcher is "knative reconciler" so I am not sure whether it is possible to emit metrics from adapter or not (it could be a topic for you) |
@aThorp96 yeah, I saw metrics emitted from controller, I was doing it before in wrong way. |
@aThorp96 but still there is question about two metrics services, (watcher, controller)? |
return err | ||
} | ||
|
||
ctx, err := tag.New( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aThorp96 at the moment metrics have event tag but user may want to filter out metrics based on event SHA, so it would be better to add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Event sha could be useful for debugging. I worry about the cardinality of tags though if we introduce the SHA as a tag. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, cardinality could be issue but there is no other way to differentiate that how many API calls were there on an event. for example it has eventType tag but there could be many push events so API request count in this metric will be cumulative in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I think for that use-case may be better suited for tracing, replaying an event, using a unit or e2e test.
The way I see this being used is to first identify particularly heavy event types and repositories. I think that is useful for an SRE alone, but if an SRE wanted to know how many API calls were made for a given event, I think there are more appropriate tools/techniques than metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I think for that use-case may be better suited for tracing, replaying an event, using a unit or e2e test.
I don't think that user may want to run e2e tests in production 🙂
a45829a
to
9f912e4
Compare
Requiring the client object be accessed via an accessor method allows for hooking into API usage for things like API metrics, since the github-client library API is designed in a way difficult to decorate.
Every time we access a Git provider's API we may use some of that provider's API limits. Medium-sized providers may have many repositories using PaC but may not have the computer overhead to support permissive API rate limits. In some cases we have seen pipelines delayed due to API rate limiting. By tracking the Git providers' API load by PaC we can gain insight into potential redundancies, hot spots, and optimizations to the Git provider API use. Since the Git provider API clients do not expose these metrics and are not designed in a way that is easy to decorate, the closest proxy to tracking actual API calls is tracking accesses to the API client itself. Since one client-method call equates to one API call, this metric should track the API usage well. However if a developer saves a reference to the API client then they will be able to make API calls without the metric incrementing; we will have to watch out for this if we want the metric to be accurate.
9f912e4
to
7704ffb
Compare
@zakisk per our discussion, I added some documentation regarding the git-provider metrics being emitted from both services. Here is a screenshot of one of the example queries I included, taken after running some of our end to end tests |
/retest |
Description: gitProviderAPIRequestCount.Description(), | ||
Measure: gitProviderAPIRequestCount, | ||
Aggregation: view.Count(), | ||
TagKeys: []tag.Key{R.provider, R.eventType, R.namespace, R.repository}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would that give the ability to track each provider separately ? i mean is the admin can get insight how many calls has been done for gitlab and for github app when both are used on the same cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chmouel Yes, this metric can be filtered using provider
tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. Additionally the provider tag for Gitlab and Gitea will use the API URL so you can see insight per provider-instance. Similarly Github will use either github
or github-enterprise
respectively
return v.bbClient | ||
} | ||
|
||
func (v *Provider) recordAPIUsageMetrics() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aThorp96 this is implemented for every provider and seems repeated code, can you find a way for a common logic, wdyt?
if v.metrics == nil { | ||
m, err := metrics.NewRecorder() | ||
if err != nil { | ||
v.Logger.Errorf("Error initializing bitbucketcloud metrics recorder: %v", err) | ||
return | ||
} | ||
v.metrics = m | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aThorp96 like doing this in a new method on provider interface e.g. GetMetrics
and the using that method for emitting metrics from a single common function.
namespace = v.repo.Namespace | ||
} | ||
|
||
if err := v.metrics.ReportGitProviderAPIUsage("bitbucketcloud", v.triggerEvent, namespace, name); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes
Every time we access a Git provider's API we may use some of that
provider's API limits. Medium-sized providers may have many repositories
using PaC but may not have the computer overhead to support permissive
API rate limits. In some cases we have seen pipelines delayed due to API
rate limiting.
By tracking the Git providers' API load by PaC we can gain insight into
potential redundancies, hot spots, and optimizations to the Git provider
API use.
Since the Git provider API clients do not expose these metrics and are
not designed in a way that is easy to decorate, the closest proxy to
tracking actual API calls is tracking accesses to the API client itself.
Since one client-method call equates to one API call, this metric should
track the API usage well. However if a developer saves a reference to the
API client then they will be able to make API calls without the metric
incrementing; we will have to watch out for this if we want the metric
to be accurate.
issue: #1925
Submitter Checklist
📝 Ensure your commit message is clear and informative. Refer to the How to write a git commit message guide. Include the commit message in the PR body rather than linking to an external site (e.g., Jira ticket).
♽ Run make test lint before submitting a PR to avoid unnecessary CI processing. Consider installing pre-commit and running pre-commit install in the repository root for an efficient workflow.
✨ We use linters to maintain clean and consistent code. Run make lint before submitting a PR. Some linters offer a --fix mode, executable with make fix-linters (ensure markdownlint and golangci-lint are installed).
📖 Document any user-facing features or changes in behavior.
🧪 While 100% coverage isn't required, we encourage unit tests for code changes where possible.
🎁 If feasible, add an end-to-end test. See README for details.
🔎 Address any CI test flakiness before merging, or provide a valid reason to bypass it (e.g., token rate limitations).
If adding a provider feature, fill in the following details:
(update the documentation accordingly)