-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow partial metric updates #561
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/assign @ahg-g |
/approve We need to flag stale metrics without completely spamming the the logs |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, liu-cong The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Right. Today these logs will print as TRACE level so by default you shouldn't see them. What do you think about this:
|
This PR allows partial metric updates even if the PodMetricsClient returns an error due to failure in processing a subset of the metrics. A partial update is considered better than no update.
A practical issue this fixes is that in vllm v1, if LoRA adapter is not enabled, the lora metrics won't even show up. Therefore the metrics won't get refreshed.
This also helps with any transient error in missing a metric.
Note1: For the LoRA metric, technically we can have a flag to indicate whether we can skip scraping it. That can be a separate followup.
Note2: There should be a separate "conformance test" effort to make sure the supported model server emit the metrics required by our protocol. The PodMetricsClient shouldn't be responsible for that. Therefore it's safe to optimistically allow partial updates.