[Metrics] Add average kv cache and waiting queue size metrics for inference pool #304

JeffLuoo · 2025-02-07T16:10:12Z

No description provided.

k8s-ci-robot · 2025-02-07T16:10:22Z

Hi @JeffLuoo. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

JeffLuoo · 2025-02-07T16:10:27Z

cc for review:

@courageJ @liu-cong

netlify · 2025-02-07T16:10:29Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`801e18c`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67aa1e3c1c99a30008f48ca5
😎 Deploy Preview	https://deploy-preview-304--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

ahg-g · 2025-02-07T16:58:31Z

/ok-to-test

pkg/ext-proc/backend/provider.go

pkg/ext-proc/metrics/metrics.go

ahg-g · 2025-02-07T18:56:44Z

pkg/ext-proc/metrics/metrics.go

+		[]string{"name"},
+	)
+
+	inferencePoolAvgQueueSize = compbasemetrics.NewGaugeVec(


@raywainman we are reporting average queue length across model servers, what alternatives would you suggest to use this for HPA? Can HPA consume a distribution and do the aggregation on its end and so the user have more flexibility on how to aggregate?

/cc @smarterclayton

HPA can't consume a distribution directly today unless we put a Prometheus adapter infront of the metric and convert it to a direct gauge metric (which is doable). For example you could do something like "Get 90%ile queue size over the last 5 minutes" this way. Do we anticipate that being useful?

If so we could maybe emit both?

One simple gauge metric emitting the instantaneous average queue size across all model servers and another metric with a distribution.

@JeffLuoo what do you think?

In our benchmarking, we scrape gauge metrics for cache utilization and queue size. Let's discuss whether distribution for queue size is more helpful or other metrics from model servers are more helpful.

Inference pool metrics are calculated from metrics from model servers (vLLM in current implementation) directly.

Let's target to have new metrics added in a follow-up CL (e.g. percentiles) to unblock this CL.

That sounds great, made #306 to track

pkg/ext-proc/metrics/metrics.go

pkg/ext-proc/backend/provider.go

raywainman · 2025-02-07T18:59:10Z

pkg/ext-proc/backend/provider.go

+		podTotalCount++
+		if val, ok := p.podMetrics.Load(pod.Name); ok {
+			pm := val.(*PodMetrics)
+			kvCacheTotal += pm.KVCacheUsagePercent


Just a high level thought....

As an optimization, would we ever consider doing this calculation as part of the actual logic in

gateway-api-inference-extension/pkg/ext-proc/scheduling/filter.go

Line 134 in 3ff0af8

func leastKVCacheFilterFunc(req *LLMRequest, pods []*backend.PodMetrics) ([]*backend.PodMetrics, error) {

?

Then we are computing these metrics directly in-line with the endpoint picking logic and could get the absolute freshest value.

Actually, one thing to consider here is that the probing will be frequent, and currently the podMetrics map only reflects the latest probed value. We should consider aggregating over a time window to avoid oscillations. @liu-cong @kaushikmitr did we think about this in the context of the endpoint picking algorithm (i.e., using the absolute last value vs aggregation over a window)?

raywainman

Overall LGTM, this lays out a good foundation and we can build on this by adding more metrics over time.

/lgtm

pkg/ext-proc/test/benchmark/benchmark.go

pkg/manifests/ext_proc.yaml

pkg/ext-proc/test/benchmark/benchmark.go

inference pool

kfswain · 2025-02-10T15:54:07Z

Looks great! Thanks for this, really cool to see metrics at the pool level coming out.

/lgtm
/approve

k8s-ci-robot · 2025-02-10T15:54:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JeffLuoo, kfswain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kfswain]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 7, 2025

k8s-ci-robot requested review from ahg-g and kfswain February 7, 2025 16:10

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 7, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 7, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 7, 2025

ahg-g reviewed Feb 7, 2025

View reviewed changes

k8s-ci-robot requested a review from smarterclayton February 7, 2025 18:56

raywainman reviewed Feb 7, 2025

View reviewed changes

JeffLuoo force-pushed the inference-pool-metrics branch from b167c1e to 2f552c1 Compare February 7, 2025 19:41

JeffLuoo requested review from raywainman and ahg-g February 7, 2025 19:42

raywainman reviewed Feb 7, 2025

View reviewed changes

pkg/ext-proc/test/benchmark/benchmark.go Outdated Show resolved Hide resolved

pkg/manifests/ext_proc.yaml Outdated Show resolved Hide resolved

pkg/ext-proc/test/benchmark/benchmark.go Outdated Show resolved Hide resolved

k8s-ci-robot assigned raywainman Feb 7, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2025

[Metrics] Add average kv cache and waiting queue size metrics for

801e18c

inference pool

JeffLuoo force-pushed the inference-pool-metrics branch from 2f552c1 to 801e18c Compare February 10, 2025 15:41

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 10, 2025

JeffLuoo requested a review from raywainman February 10, 2025 15:42

k8s-ci-robot assigned kfswain Feb 10, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 10, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 10, 2025

k8s-ci-robot merged commit 7149624 into kubernetes-sigs:main Feb 10, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metrics] Add average kv cache and waiting queue size metrics for inference pool #304

[Metrics] Add average kv cache and waiting queue size metrics for inference pool #304

JeffLuoo commented Feb 7, 2025

k8s-ci-robot commented Feb 7, 2025

JeffLuoo commented Feb 7, 2025

netlify bot commented Feb 7, 2025 •

edited

Loading

ahg-g commented Feb 7, 2025

ahg-g Feb 7, 2025

raywainman Feb 7, 2025

JeffLuoo Feb 7, 2025

JeffLuoo Feb 10, 2025 •

edited

Loading

kfswain Feb 10, 2025

raywainman Feb 7, 2025

ahg-g Feb 7, 2025

raywainman left a comment

kfswain commented Feb 10, 2025

k8s-ci-robot commented Feb 10, 2025

[Metrics] Add average kv cache and waiting queue size metrics for inference pool #304

[Metrics] Add average kv cache and waiting queue size metrics for inference pool #304

Conversation

JeffLuoo commented Feb 7, 2025

k8s-ci-robot commented Feb 7, 2025

JeffLuoo commented Feb 7, 2025

netlify bot commented Feb 7, 2025 • edited Loading

✅ Deploy Preview for gateway-api-inference-extension ready!

ahg-g commented Feb 7, 2025

ahg-g Feb 7, 2025

Choose a reason for hiding this comment

raywainman Feb 7, 2025

Choose a reason for hiding this comment

JeffLuoo Feb 7, 2025

Choose a reason for hiding this comment

JeffLuoo Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

kfswain Feb 10, 2025

Choose a reason for hiding this comment

raywainman Feb 7, 2025

Choose a reason for hiding this comment

ahg-g Feb 7, 2025

Choose a reason for hiding this comment

raywainman left a comment

Choose a reason for hiding this comment

kfswain commented Feb 10, 2025

k8s-ci-robot commented Feb 10, 2025

netlify bot commented Feb 7, 2025 •

edited

Loading

JeffLuoo Feb 10, 2025 •

edited

Loading