File tree 2 files changed +2
-2
lines changed
003-model-server-protocol
004-endpoint-picker-protocol
2 files changed +2
-2
lines changed Original file line number Diff line number Diff line change @@ -43,7 +43,7 @@ The model server MUST expose the following LoRA adapter metrics via the same Pro
43
43
* Metric value: The last updated timestamp (so the EPP can find the latest).
44
44
* Metric labels:
45
45
* ` max_lora ` : The maximum number of adapters that can be loaded to GPU memory to serve a batch.
46
- Requests will be queued if the model server has reached MaxActiveAdapter and canno load the
46
+ Requests will be queued if the model server has reached MaxActiveAdapter and cannot load the
47
47
requested adapter. Example: ` "max_lora": "8" ` .
48
48
* ` running_lora_adapters ` : A comma separated list of adapters that are currently loaded in GPU
49
49
memory and ready to serve requests. Example: ` "running_lora_adapters": "adapter1, adapter2" `
Original file line number Diff line number Diff line change @@ -7,7 +7,7 @@ found [here](../../../pkg/epp/).
7
7
This doc defines the protocol between the EPP and the proxy (e.g, Envoy).
8
8
9
9
The EPP MUST implement the Envoy
10
- [ external processing service] ( https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/ext_proc/v3/external_processor ) protocol.
10
+ [ external processing service] ( https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/ext_proc/v3/external_processor ) protocol.
11
11
12
12
For each HTTP request, the EPP MUST communicate to the proxy the picked model server endpoint via:
13
13
You can’t perform that action at this time.
0 commit comments