-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expose "Normalized Time Per Output Token" (NTPOT) metric #643
base: main
Are you sure you want to change the base?
Conversation
Hi @kaushikmitr. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/ok-to-test |
pkg/epp/metrics/metrics.go
Outdated
// RecordLatencyPerOutputToken (NTPOT) records the normalized time per output token. | ||
func RecordLatencyPerOutputToken(ctx context.Context, modelName, targetModelName string, received time.Time, complete time.Time, outputTokenCount int) bool { | ||
if !complete.After(received) { | ||
log.FromContext(ctx).V(logutil.DEFAULT).Error(nil, "Request latency values are invalid for NTPOT calculation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.FromContext(ctx).V(logutil.DEFAULT).Error(nil, "Request latency values are invalid for NTPOT calculation", | |
log.FromContext(ctx).Error(nil, "Request latency values are invalid for NTPOT calculation", |
pkg/epp/metrics/metrics.go
Outdated
} | ||
|
||
if outputTokenCount <= 0 { | ||
log.FromContext(ctx).V(logutil.DEFAULT).Error(nil, "Output token count must be positive for NTPOT calculation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.FromContext(ctx).V(logutil.DEFAULT).Error(nil, "Output token count must be positive for NTPOT calculation", | |
log.FromContext(ctx).Error(nil, "Output token count must be positive for NTPOT calculation", |
pkg/epp/handlers/streamingserver.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this to server.go as well?
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kaushikmitr, liu-cong The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/lgtm |
/lgtm |
Regarding the metrics name, can we align it with vllm naming convention: https://github.com/vllm-project/vllm/blob/main/docs/source/design/v1/metrics.md?plain=1#L37
|
New changes are detected. LGTM label has been removed. |
updated metric name from ntpot to normalized time per output token |
This pull request introduces a new metric to record the normalized time per output token (NTPOT) for inference models. The changes span across multiple files to implement this new metric and include updates to the metrics recording, registration, and testing.
New Metric Implementation:
pkg/epp/metrics/metrics.go
: Added a new histogram metricNormalizedTimePerOutputToken
to record the normalized time per output token for each model and target model.pkg/epp/metrics/metrics.go
: Implemented theRecordNormalizedTimePerOutputToken
function to calculate and record the normalized time per output token.pkg/epp/metrics/metrics.go
: Registered the newNormalizedTimePerOutputToken
metric.Integration with Existing Code:
pkg/epp/handlers/server.go
andpkg/epp/handlers/streamingserver.go
: Added calls toRecordNormalizedTimePerOutputToken
to record the metric when processing requests.Testing:
pkg/epp/metrics/metrics_test.go
: Added test cases forRecordNormalizedTimePerOutputToken
to verify the correct recording of the metric under various scenarios.pkg/epp/metrics/testdata/normalized_time_per_output_token_seconds_metric
: Added test data for the new metric.Documentation:
site-src/guides/metrics.md
: Updated the metrics documentation to include the newntpot_seconds
metric.Tested by creating the following visualization
Added NTPO Inference Gateway Metric Dashboard
