Skip to content

Latest commit

 

History

History
19 lines (12 loc) · 683 Bytes

inferencemodel.md

File metadata and controls

19 lines (12 loc) · 683 Bytes

Inference Model

??? example "Alpha since v0.1.0"

The `InferenceModel` resource is alpha and may have breaking changes in
future releases of the API.

Background

An InferenceModel allows the Inference Workload Owner to define:

  • Which Model/LoRA adapter(s) to consume.
    • Mapping from a client facing model name to the target model name in the InferencePool.
    • InferenceModel allows for traffic splitting between adapters in the same InferencePool to allow for new LoRA adapter versions to be easily rolled out.
  • Criticality of the requests to the InferenceModel.

Spec

The full spec of the InferenceModel is defined here.