??? example "Alpha since v0.1.0"
The `InferenceModel` resource is alpha and may have breaking changes in
future releases of the API.
An InferenceModel allows the Inference Workload Owner to define:
- Which Model/LoRA adapter(s) to consume.
- Mapping from a client facing model name to the target model name in the InferencePool.
- InferenceModel allows for traffic splitting between adapters in the same InferencePool to allow for new LoRA adapter versions to be easily rolled out.
- Criticality of the requests to the InferenceModel.
The full spec of the InferenceModel is defined here.