You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: site-src/guides/metrics.md
+35-6
Original file line number
Diff line number
Diff line change
@@ -4,13 +4,29 @@ This guide describes the current state of exposed metrics and how to scrape them
4
4
5
5
## Requirements
6
6
7
-
Response metrics are only supported in non-streaming mode, with the follow up [issue](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/178) to address streaming mode.
7
+
For non-streaming request, enable `Buffered` for response in `EnvoyExtensionPolicy`:
8
8
9
-
Currently there are two options:
10
-
- If requests don't use response streaming, then you can enable `Buffered` mode for response in `EnvoyExtensionPolicy`, this will buffer the response body at the proxy and forward it to the endpoint picker, which allows the endpoint picker to report response metrics.
11
-
12
-
- If requests use response streaming, then it is not recommended to enable `Buffered` mode, the response body processing mode should be left empty in the `EnvoyExtensionPolicy` (default). In this case response bodies will not be forwarded to the endpoint picker, and therefore response metrics will not be reported.
9
+
```
10
+
apiVersion: gateway.envoyproxy.io/v1alpha1
11
+
kind: EnvoyExtensionPolicy
12
+
metadata:
13
+
name: ext-proc-policy
14
+
namespace: default
15
+
spec:
16
+
extProc:
17
+
- backendRefs:
18
+
- group: ""
19
+
kind: Service
20
+
name: inference-gateway-ext-proc
21
+
port: 9002
22
+
processingMode:
23
+
request:
24
+
body: Buffered
25
+
response:
26
+
body: Buffered
27
+
```
13
28
29
+
For streaming request, enable `Streamed` for response in `EnvoyExtensionPolicy`:
14
30
15
31
```
16
32
apiVersion: gateway.envoyproxy.io/v1alpha1
@@ -29,7 +45,20 @@ spec:
29
45
request:
30
46
body: Buffered
31
47
response:
32
-
body: Buffered
48
+
body: Streamed
49
+
```
50
+
51
+
If you want to include usage metrics for vLLM model server, send the request with `include_usage`:
0 commit comments