You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: pkg/epp/metrics/README.md
+35-6
Original file line number
Diff line number
Diff line change
@@ -8,13 +8,29 @@ This documentation is the current state of exposed metrics.
8
8
9
9
## Requirements
10
10
11
-
Response metrics are only supported in non-streaming mode, with the follow up [issue](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/178) to address streaming mode.
11
+
For non-streaming request, enable `Buffered` for response in `EnvoyExtensionPolicy`:
12
12
13
-
Currently there are two options:
14
-
- If requests don't use response streaming, then you can enable `Buffered` mode for response in `EnvoyExtensionPolicy`, this will buffer the response body at the proxy and forward it to the endpoint picker, which allows the endpoint picker to report response metrics.
15
-
16
-
- If requests use response streaming, then it is not recommended to enable `Buffered` mode, the response body processing mode should be left empty in the `EnvoyExtensionPolicy` (default). In this case response bodies will not be forwarded to the endpoint picker, and therefore response metrics will not be reported.
13
+
```
14
+
apiVersion: gateway.envoyproxy.io/v1alpha1
15
+
kind: EnvoyExtensionPolicy
16
+
metadata:
17
+
name: ext-proc-policy
18
+
namespace: default
19
+
spec:
20
+
extProc:
21
+
- backendRefs:
22
+
- group: ""
23
+
kind: Service
24
+
name: inference-gateway-ext-proc
25
+
port: 9002
26
+
processingMode:
27
+
request:
28
+
body: Buffered
29
+
response:
30
+
body: Buffered
31
+
```
17
32
33
+
For streaming request, enable `Streamed` for response in `EnvoyExtensionPolicy`:
18
34
19
35
```
20
36
apiVersion: gateway.envoyproxy.io/v1alpha1
@@ -33,7 +49,20 @@ spec:
33
49
request:
34
50
body: Buffered
35
51
response:
36
-
body: Buffered
52
+
body: Streamed
53
+
```
54
+
55
+
If you want to include usage metrics for vLLM model server, send the request with `include_usage`:
0 commit comments