|
| 1 | +# InferencePool |
| 2 | + |
| 3 | +A chart to deploy an InferencePool and a corresponding EndpointPicker (epp) deployment. |
| 4 | + |
| 5 | + |
| 6 | +## Install |
| 7 | + |
| 8 | +To install an InferencePool named `pool-1` that selects from endpoints with label `app: vllm-llama2-7b` and listening on port `8000`, you can run the following command: |
| 9 | + |
| 10 | +```txt |
| 11 | +$ helm install pool-1 ./config/charts/inferencepool \ |
| 12 | + --set inferencePool.name=pool-1 \ |
| 13 | + --set inferencePool.selector.app=vllm-llama2-7b \ |
| 14 | + --set inferencePool.targetPortNumber=8000 |
| 15 | +``` |
| 16 | + |
| 17 | +where `inferencePool.targetPortNumber` is the pod that vllm backends served on and `inferencePool.selector` is the selector to match the vllm backends. |
| 18 | + |
| 19 | +## Uninstall |
| 20 | + |
| 21 | +Run the following command to uninstall the chart: |
| 22 | + |
| 23 | +```txt |
| 24 | +$ helm uninstall pool-1 |
| 25 | +``` |
| 26 | + |
| 27 | +## Configuration |
| 28 | + |
| 29 | +The following table list the configurable parameters of the chart. |
| 30 | + |
| 31 | +| **Parameter Name** | **Description** | |
| 32 | +|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------| |
| 33 | +| `inferencePool.name` | Name for the InferencePool, and inference extension will be named as `${inferencePool.name}-epp`. | |
| 34 | +| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. | |
| 35 | +| `inferencePool.selector` | Label selector to match vllm backends managed by the inference pool. | |
| 36 | +| `inferenceExtension.replicas` | Number of replicas for the inference extension service. Defaults to `1`. | |
| 37 | +| `inferenceExtension.image.name` | Name of the container image used for the inference extension. | |
| 38 | +| `inferenceExtension.image.hub` | Registry URL where the inference extension image is hosted. | |
| 39 | +| `inferenceExtension.image.tag` | Image tag of the inference extension. | |
| 40 | +| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. | |
| 41 | +| `inferenceExtension.extProcPort` | Port where the inference extension service is served for external processing. Defaults to `9002`. | |
| 42 | + |
| 43 | +## Notes |
| 44 | + |
| 45 | +This chart will only deploy an InferencePool and its corresponding EndpointPicker extension. Before install the chart, please make sure that the inference extension CRDs are installed in the cluster. For more details, please refer to the [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/). |
0 commit comments