You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: site-src/guides/metrics.md
+14-8
Original file line number
Diff line number
Diff line change
@@ -4,14 +4,7 @@ This guide describes the current state of exposed metrics and how to scrape them
4
4
5
5
## Requirements
6
6
7
-
Response metrics are only supported in non-streaming mode, with the follow up [issue](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/178) to address streaming mode.
8
-
9
-
Currently there are two options:
10
-
- If requests don't use response streaming, then you can enable `Buffered` mode for response in `EnvoyExtensionPolicy`, this will buffer the response body at the proxy and forward it to the endpoint picker, which allows the endpoint picker to report response metrics.
11
-
12
-
- If requests use response streaming, then it is not recommended to enable `Buffered` mode, the response body processing mode should be left empty in the `EnvoyExtensionPolicy` (default). In this case response bodies will not be forwarded to the endpoint picker, and therefore response metrics will not be reported.
13
-
14
-
7
+
To have response metrics, set the body mode to `Buffered` or `Streamed`:
15
8
```
16
9
apiVersion: gateway.envoyproxy.io/v1alpha1
17
10
kind: EnvoyExtensionPolicy
@@ -32,6 +25,19 @@ spec:
32
25
body: Buffered
33
26
```
34
27
28
+
If you want to include usage metrics for vLLM model server streaming request, send the request with `include_usage`:
0 commit comments