|
1 |
| -# Documentation |
| 1 | +# Metrics |
2 | 2 |
|
3 |
| -This documentation is the current state of exposed metrics. |
4 |
| - |
5 |
| -## Table of Contents |
6 |
| -* [Exposed Metrics](#exposed-metrics) |
7 |
| -* [Scrape Metrics](#scrape-metrics) |
| 3 | +This guide describes the current state of exposed metrics and how to scrape them. |
8 | 4 |
|
9 | 5 | ## Requirements
|
10 | 6 |
|
@@ -38,17 +34,17 @@ spec:
|
38 | 34 |
|
39 | 35 | ## Exposed metrics
|
40 | 36 |
|
41 |
| -| Metric name | Metric Type | Description | Labels | Status | |
42 |
| -| ------------|--------------| ----------- | ------ | ------ | |
43 |
| -| inference_model_request_total | Counter | The counter of requests broken out for each model. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
44 |
| -| inference_model_request_error_total | Counter | The counter of requests errors broken out for each model. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
45 |
| -| inference_model_request_duration_seconds | Distribution | Distribution of response latency. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
46 |
| -| inference_model_request_sizes | Distribution | Distribution of request size in bytes. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
47 |
| -| inference_model_response_sizes | Distribution | Distribution of response size in bytes. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
48 |
| -| inference_model_input_tokens | Distribution | Distribution of input token count. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
49 |
| -| inference_model_output_tokens | Distribution | Distribution of output token count. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
50 |
| -| inference_pool_average_kv_cache_utilization | Gauge | The average kv cache utilization for an inference server pool. | `name`=<inference-pool-name> | ALPHA | |
51 |
| -| inference_pool_average_queue_size | Gauge | The average number of requests pending in the model server queue. | `name`=<inference-pool-name> | ALPHA | |
| 37 | +| **Metric name** | **Metric Type** | <div style="width:200px">**Description**</div> | <div style="width:250px">**Labels**</div> | **Status** | |
| 38 | +|:---------------------------------------------|:-----------------|:------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:------------| |
| 39 | +| inference_model_request_total | Counter | The counter of requests broken out for each model. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
| 40 | +| inference_model_request_error_total | Counter | The counter of requests errors broken out for each model. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
| 41 | +| inference_model_request_duration_seconds | Distribution | Distribution of response latency. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
| 42 | +| inference_model_request_sizes | Distribution | Distribution of request size in bytes. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
| 43 | +| inference_model_response_sizes | Distribution | Distribution of response size in bytes. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
| 44 | +| inference_model_input_tokens | Distribution | Distribution of input token count. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
| 45 | +| inference_model_output_tokens | Distribution | Distribution of output token count. | `model_name`=<model-name> <br> `target_model_name`=<target-model-name> | ALPHA | |
| 46 | +| inference_pool_average_kv_cache_utilization | Gauge | The average kv cache utilization for an inference server pool. | `name`=<inference-pool-name> | ALPHA | |
| 47 | +| inference_pool_average_queue_size | Gauge | The average number of requests pending in the model server queue. | `name`=<inference-pool-name> | ALPHA | |
52 | 48 |
|
53 | 49 | ## Scrape Metrics
|
54 | 50 |
|
|
0 commit comments