Add a metric to track number of ready pods #599

ahg-g · 2025-03-28T13:38:37Z

What would you like to be added:

A metric emitted by the EPP to track the number of ready pods from the EPP PoV.

Why is this needed:

The EPP doesn't own the status of the InferencePool object, so it can't keep track of that there. Having a metric to track the number of ready pods is important for operators to track rollouts.

smarterclayton · 2025-03-28T13:49:19Z

We probably should plan to break that metric down by attributes the pool is aware of (i.e. not O(pods), but O(groups of pods under the pool)). We don't have such a label, but I can already see ones coming up like heterogenity and assignment.

JeffLuoo · 2025-03-28T17:48:01Z

I have an idea to add a new metric for inference pool

inference_pool_per_pod_queue_size{name=<inference pool name>, pod=<name of pod under the pool>}

which tells the total number queue for each pod under the pool. It can provide:

number of pods related to the pool.
queue length for each pod, and see if our scheduling algorithm (or the priority) works as intended.

WDYT? @ahg-g @smarterclayton

JeffLuoo mentioned this issue Mar 31, 2025

[Metrics] Add number of ready pods metric for inference pool #622

Merged

k8s-ci-robot closed this as completed in #622 Mar 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a metric to track number of ready pods #599

Add a metric to track number of ready pods #599

ahg-g commented Mar 28, 2025

smarterclayton commented Mar 28, 2025

JeffLuoo commented Mar 28, 2025 •

edited

Loading

Add a metric to track number of ready pods #599

Add a metric to track number of ready pods #599

Comments

ahg-g commented Mar 28, 2025

smarterclayton commented Mar 28, 2025

JeffLuoo commented Mar 28, 2025 • edited Loading

JeffLuoo commented Mar 28, 2025 •

edited

Loading