Skip to content

Refactor Prometheus and Add Request Level Metrics #2316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 52 commits into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
56d398b
added first refactor of metrics
Dec 30, 2023
2239e73
refactored to use counters rather than gauges for monotonically incre…
Dec 30, 2023
1e6ad74
added dev notebook, started running live
Dec 30, 2023
10d5353
first example where things did not completely break :)!
Dec 31, 2023
5199cdd
end to end things seem to be working
Dec 31, 2023
f69f639
logging properly to prom, seeing the metrics come up
Dec 31, 2023
0c24fc3
added full example setting up prom/grafana logging, including default…
Dec 31, 2023
874df77
removed local logging
Dec 31, 2023
63aecbc
stashing refactor to stateless loggers
Jan 1, 2024
7782baf
refactored code to support stateless iteration logging; this made eve…
Jan 1, 2024
92dda00
missed metrics file :)
Jan 1, 2024
6e7f715
made seq_group implementation simplier
Jan 1, 2024
67aaed7
updated metrics page ordering
Jan 1, 2024
69093b2
updated formatting / type checking
Jan 4, 2024
114a4c9
updated api server to support /metrics so I could run performance ben…
Jan 4, 2024
f68f4f7
Update async_llm_engine.py
robertgshaw2-redhat Jan 4, 2024
ce0534f
quality
Jan 5, 2024
05b3206
Merge branch 'vllm-project:main' into rs/feature/metrics
robertgshaw2-redhat Jan 5, 2024
450dfc2
cleaned up to use only one Stat type; added other metric
Jan 5, 2024
0e65765
quality
Jan 5, 2024
9fee85f
Update outputs.py
robertgshaw2-redhat Jan 5, 2024
9cdd6c4
reverted changes to api_server.py
Jan 5, 2024
d1dcac6
removed line to match base
Jan 5, 2024
a42c3ca
stash to move to other machine
Jan 6, 2024
e2207db
factored per simon-mo request
Jan 6, 2024
a90d447
readded files
Jan 6, 2024
567d32f
fixed bugs in initial version
Jan 6, 2024
f363df7
e2e functional testing complete
Jan 6, 2024
519581b
readded images
Jan 6, 2024
551a3c0
quality
Jan 6, 2024
1f38f15
Merge branch 'main' into rs/feature/metrics
robertgshaw2-redhat Jan 7, 2024
32d2259
fixed merge issue
Jan 7, 2024
14f36ae
Update grafana.json
robertgshaw2-redhat Jan 7, 2024
6a81d89
updated with simplier more direct implementation
Jan 20, 2024
3a28d48
smoke test confirms changes are working
Jan 20, 2024
9c465c7
Merge branch 'main' into rs/feature/metrics
robertgshaw2-redhat Jan 20, 2024
d800e7f
simplified example
Jan 20, 2024
f117808
format
Jan 20, 2024
30e88e4
Update benchmark_serving.py
robertgshaw2-redhat Jan 20, 2024
6bbba50
Update llm_engine.py
robertgshaw2-redhat Jan 20, 2024
3cff058
Update README.md
robertgshaw2-redhat Jan 20, 2024
38578d3
Update README.md
robertgshaw2-redhat Jan 20, 2024
629e1d3
Update examples/production_monitoring/README.md
robertgshaw2-redhat Jan 24, 2024
cef0432
Update examples/production_monitoring/README.md
robertgshaw2-redhat Jan 24, 2024
dc4eaa5
Update vllm/engine/llm_engine.py
robertgshaw2-redhat Jan 24, 2024
d517924
Update vllm/sequence.py
robertgshaw2-redhat Jan 24, 2024
0b726c5
fixes simon's concerns and validates working properly. renames metric…
Jan 26, 2024
9b76d60
Merge branch 'main' into rs/feature/metrics
robertgshaw2-redhat Jan 26, 2024
6fed96c
format
Jan 26, 2024
7f1379b
new line
Jan 26, 2024
3c18cb5
new line
Jan 26, 2024
6b9afa2
confirmed everything is working e2e
Jan 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions examples/production_monitoring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# vLLM + Prometheus/Grafana

This is a simple example that shows you how to connect vLLM metric logging to the Prometheus/Grafana stack. For this example, we launch Prometheus and Grafana via Docker. You can checkout other methods through [Prometheus](https://prometheus.io/) and [Grafana](https://grafana.com/) websites.

Install:
- [`docker`](https://docs.docker.com/engine/install/)
- [`docker compose`](https://docs.docker.com/compose/install/linux/#install-using-the-repository)

### Launch

Prometheus metric logging is enabled by default in the OpenAI-compatible server. Launch via the entrypoint:
```bash
python3 -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-7B-v0.1 \
--max-model-len 2048 \
--disable-log-requests
```

Launch Prometheus and Grafana servers with `docker compose`:
```bash
docker compose up
```

Submit some sample requests to the server:
```bash
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

python3 ../../benchmarks/benchmark_serving.py \
--model mistralai/Mistral-7B-v0.1 \
--tokenizer mistralai/Mistral-7B-v0.1 \
--endpoint /v1/completions \
--dataset ShareGPT_V3_unfiltered_cleaned_split.json \
--request-rate 3.0
```

Navigating to [`http://localhost:8000/metrics`](http://localhost:8000/metrics) will show the raw Prometheus metrics being exposed by vLLM.

### Grafana Dashboard

Navigate to [`http://localhost:3000`](http://localhost:3000). Log in with the default username (`admin`) and password (`admin`).

#### Add Prometheus Data Source

Navigate to [`http://localhost:3000/connections/datasources/new`](http://localhost:3000/connections/datasources/new) and select Prometheus.

On Prometheus configuration page, we need to add the `Prometheus Server URL` in `Connection`. For this setup, Grafana and Prometheus are running in separate containers, but Docker creates DNS name for each containers. You can just use `http://prometheus:9090`.

Click `Save & Test`. You should get a green check saying "Successfully queried the Prometheus API.".

#### Import Dashboard

Navigate to [`http://localhost:3000/dashboard/import`](http://localhost:3000/dashboard/import), upload `grafana.json`, and select the `prometheus` datasource. You should see a screen that looks like the following:

![Grafana Dashboard Image](https://i.imgur.com/R2vH9VW.png)
19 changes: 19 additions & 0 deletions examples/production_monitoring/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# docker-compose.yaml
version: "3"

services:
prometheus:
image: prom/prometheus:latest
extra_hosts:
- "host.docker.internal:host-gateway" # allow a direct connection from container to the local machine
ports:
- "9090:9090" # the default port used by Prometheus
volumes:
- ${PWD}/prometheus.yaml:/etc/prometheus/prometheus.yml # mount Prometheus config file

grafana:
image: grafana/grafana:latest
depends_on:
- prometheus
ports:
- "3000:3000" # the default port used by Grafana
Loading