Inference Pool

??? example "Alpha since v0.1.0"

The `InferencePool` resource is alpha and may have breaking changes in
future releases of the API.

Background

The InferencePool at its core is a logical grouping of compute, expressed in the form of Pods (typically model servers), akin to a K8s Service. The InferencePool would deploy its own routing, and offer administrative configuration to the Platform Admin.

It is expected for the InferencePool to:

Enforce fair consumption of resources across competing workloads
Efficiently route requests across shared compute (as displayed by the PoC)

It is not expected for the InferencePool to:

Enforce any common set of adapters or base models are available on the Pods
Manage Deployments of Pods within the Pool
Manage Pod lifecycle of pods within the pool

Additionally, any Pod that seeks to join an InferencePool would need to support a protocol, defined by this project, to ensure the Pool has adequate information to intelligently route requests.

The inference pool had some small overlap with the Service spec, displayed here:

The InferencePool is not intended to be a mask of the Service object, simply exposing the absolute bare minimum required to allow the Platform Admin to focus on less on networking, and more on Pool management.

Spec

The full spec of the InferencePool is defined here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inferencepool.md

inferencepool.md

Inference Pool

Background

Spec

Files

inferencepool.md

Latest commit

History

inferencepool.md

File metadata and controls

Inference Pool

Background

Spec