Skip to content

Proposal: Adding more Prometheus metrics #2650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ronensc opened this issue Jan 29, 2024 · 3 comments
Closed

Proposal: Adding more Prometheus metrics #2650

ronensc opened this issue Jan 29, 2024 · 3 comments

Comments

@ronensc
Copy link
Contributor

ronensc commented Jan 29, 2024

Once #2316 is merged, I'm willing to contribute the following metrics which I believe would be helpful for monitoring the usage of vllm.

# Metric Type Labels Description
1. vllm:request_success Counter finish_reason=stop|length Count of successfully processed requests.
2. vllm:request_params_max_tokens Histogram Value of max_tokens request parameter.
3. vllm:request_params_n Histogram Value of n request parameter.
4. vllm:request_total_tokens Histogram Total sequence length of request (input tokens + generated tokens).
5. vllm:request_prompt_tokens Histogram Number of prefill tokens processed.
6. vllm:request_generation_tokens Histogram Number of generation tokens processed.

Notes:
metrics 5. and 6. already exist but as counters (vllm:prompt_tokens_total and vllm:generation_tokens_total). I think a Histogram is more meaningful. For backward compatibility, we can keep both types (counters and histograms).

Please let me know what you think.

@robertgshaw2-redhat
Copy link
Collaborator

robertgshaw2-redhat commented Jan 29, 2024

  • Other thing would be number of aborted_requests

A big limitation of the current profiling of E2E latency is that there is no normalization for the number of tokens processed. There is probably nothing we could do to normalize this perfectly, but perhaps diving E2E latency by number of generation tokens will give a better normalized metric, so perhaps something that expands on this would be good

@ronensc
Copy link
Contributor Author

ronensc commented Jan 31, 2024

Thanks for your suggestion! aborted_requests metric sounds like an important addition. However, I'm not entirely clear on how it contributes to the normalization of e2e latency per the number of tokens processed. Could you please provide more details?

@robertgshaw2-redhat
Copy link
Collaborator

Oh sorry those are completely separate and should have been two bullet points

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants