Skip to content

Commit fb804b0

Browse files
authored
Amend the endpoint picker protocol to support fallbacks and subsetting (#445)
* Amend the endpoint picker protocol to support fallbacks and subsetting * Addressed comments * specify the behavior when the epp doesn't respect the subset * addressing more comments * Addressed comments * Addressed comments 2 * typo * clarified that errors must be returned using immediate reponse * updated status code
1 parent d72819a commit fb804b0

File tree

1 file changed

+37
-7
lines changed
  • docs/proposals/004-endpoint-picker-protocol

1 file changed

+37
-7
lines changed

docs/proposals/004-endpoint-picker-protocol/README.md

+37-7
Original file line numberDiff line numberDiff line change
@@ -9,27 +9,57 @@ This doc defines the protocol between the EPP and the proxy (e.g, Envoy).
99
The EPP MUST implement the Envoy
1010
[external processing service](https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/ext_proc/v3/external_processor) protocol.
1111

12+
## Endpoint Subset
13+
For each HTTP request, the proxy CAN communicate the subset of endpoints the EPP MUST pick from by setting an unstructured entry in the [filter metadata](https://github.com/envoyproxy/go-control-plane/blob/63a55395d7a39a8d43dcc7acc3d05e4cae7eb7a2/envoy/config/core/v3/base.pb.go#L819) field of the ext-proc request. The metadata entry for the subset list MUST be wrapped with an outer key (which represents the metadata namespace) with a default of `envoy.lb.subset_hint`.
14+
15+
```go
16+
filterMetadata: {
17+
"envoy.lb.subset_hint" {
18+
"x-gateway-destination-endpoint-subset": [<ip:port>, <ip:port>, ...]
19+
}
20+
}
21+
```
22+
23+
If the key `x-gateway-destination-endpoint-subset` is set, the EPP MUST only select endpoints from the specified list. If none of the endpoints in the list is eligible or the list is empty, then the EPP MUST return a [ImmediateResponse](https://github.com/envoyproxy/envoy/blob/f2023ef77bdb4abaf9feef963c9a0c291f55568f/api/envoy/service/ext_proc/v3/external_processor.proto#L195) with 503 (Service Unavailable) HTTP status code. If the EPP does not select from the list, then this leads to unpredictable behavior.
24+
25+
If the key `x-gateway-destination-endpoint-subset` is not set, then the EPP MUST select from the set defined by the `InferencePool` selector.
26+
27+
## Destination Endpoint
1228
For each HTTP request, the EPP MUST communicate to the proxy the picked model server endpoint via:
1329

1430
1. Setting the `x-gateway-destination-endpoint` HTTP header to the selected endpoint in <ip:port> format.
1531

1632
2. Set an unstructured entry in the [dynamic_metadata](https://github.com/envoyproxy/go-control-plane/blob/c19bf63a811c90bf9e02f8e0dc1dcef94931ebb4/envoy/service/ext_proc/v3/external_processor.pb.go#L320) field of the ext-proc response. The metadata entry for the picked endpoint MUST be wrapped with an outer key (which represents the metadata namespace) with a default of `envoy.lb`.
1733

18-
The final metadata necessary would look like:
34+
The primary endpoint MUST be set using the key `x-gateway-destination-endpoint` as follows:
1935
```go
2036
dynamicMetadata: {
2137
"envoy.lb": {
22-
"x-gateway-destination-endpoint": <ip:port>"
38+
"x-gateway-destination-endpoint": <ip:port>
2339
}
2440
}
2541
```
2642

27-
Note:
28-
- If the EPP did not communicate the server endpoint via these two methods, it MUST return an error.
43+
Constraints:
44+
- If the EPP did not communicate the server endpoint via these two methods, it MUST return an error as follows:
45+
- [ImmediateResponse](https://github.com/envoyproxy/envoy/blob/f2023ef77bdb4abaf9feef963c9a0c291f55568f/api/envoy/service/ext_proc/v3/external_processor.proto#L195) with 503 (Serivce Unavailable) HTTP status code if there are no ready endpoints.
46+
- [ImmediateResponse](https://github.com/envoyproxy/envoy/blob/f2023ef77bdb4abaf9feef963c9a0c291f55568f/api/envoy/service/ext_proc/v3/external_processor.proto#L195) with 429 (Too Many Requests) HTTP status code if the request should be dropped (e.g., a Sheddable request, and the servers under heavy load).
2947
- The EPP MUST not set two different values in the header and the inner response metadata value.
48+
- Setting different value leads to unpredictable behavior because proxies aren't guaranteed to support both paths, and so this protocol does not define what takes precedence.
49+
50+
### Destination endpoint fallback
51+
A single fallback endpoint CAN be set using the key `x-gateway-destination-endpoint-fallback` in the same metadata namespace as one used for `x-gateway-destination-endpoint` as follows:
3052

31-
## Why envoy.lb namespace as a default?
32-
The `envoy.lb` namesapce is a predefined namespace used for subsetting. One common way to use the selected endpoint returned from the server, is [envoy subsets](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/subsets) where host metadata for subset load balancing must be placed under `envoy.lb`.
53+
```go
54+
dynamicMetadata: {
55+
"envoy.lb" {
56+
"x-gateway-destination-endpoint-fallback": <ip:port>
57+
}
58+
}
59+
```
3360

34-
Setting different value leads to unpredictable behavior because proxies aren't guaranteed to support both paths, and so this protocol does not define what takes precedence.
61+
### Why envoy.lb namespace as a default?
62+
The `envoy.lb` namespace is a predefined namespace. One common way to use the selected endpoint returned from the server, is [envoy subsets](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/subsets) where host metadata for subset load balancing must be placed under `envoy.lb`. Note that this is not related to the subsetting feature discussed above, this is an enovy implementation detail.
3563

64+
## Matching An InferenceModel
65+
The model name of a request MUST match the `Spec.ModelName` parameter of one of the `InferenceModels` referencing the `InferencePool` managed by the EPP. Otherwise, the EPP MUST return a 404 status code.

0 commit comments

Comments
 (0)