-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Proposal: Adding more Prometheus metrics #2650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A big limitation of the current profiling of E2E latency is that there is no normalization for the number of tokens processed. There is probably nothing we could do to normalize this perfectly, but perhaps diving E2E latency by number of generation tokens will give a better normalized metric, so perhaps something that expands on this would be good |
Thanks for your suggestion! |
Oh sorry those are completely separate and should have been two bullet points |
Uh oh!
There was an error while loading. Please reload this page.
Once #2316 is merged, I'm willing to contribute the following metrics which I believe would be helpful for monitoring the usage of vllm.
stop|length
Notes:
metrics 5. and 6. already exist but as counters (
vllm:prompt_tokens_total
andvllm:generation_tokens_total
). I think a Histogram is more meaningful. For backward compatibility, we can keep both types (counters and histograms).Please let me know what you think.
The text was updated successfully, but these errors were encountered: