Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow partial metric updates #561

Merged
merged 1 commit into from
Mar 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion pkg/epp/backend/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ type PodMetricsClientImpl struct {
MetricMapping *MetricMapping
}

// FetchMetrics fetches metrics from a given pod.
// FetchMetrics fetches metrics from a given pod, clones the existing metrics object and returns an
// updated one.
func (p *PodMetricsClientImpl) FetchMetrics(
ctx context.Context,
pod *Pod,
Expand Down
21 changes: 13 additions & 8 deletions pkg/epp/backend/metrics/pod_metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -116,16 +116,21 @@ func (pm *podMetrics) refreshMetrics() error {
updated, err := pm.pmc.FetchMetrics(ctx, pm.GetPod(), pm.GetMetrics(), pool.Spec.TargetPortNumber)
if err != nil {
pm.logger.V(logutil.TRACE).Info("Failed to refreshed metrics:", "err", err)
// As refresher is running in the background, it's possible that the pod is deleted but
// the refresh goroutine doesn't read the done channel yet. In this case, we just return nil.
// The refresher will be stopped after this interval.
return nil
}
updated.UpdateTime = time.Now()

pm.logger.V(logutil.TRACE).Info("Refreshed metrics", "updated", updated)
// Optimistically update metrics even if there was an error.
// The FetchMetrics can return an error for the following reasons:
// 1. As refresher is running in the background, it's possible that the pod is deleted but
// the refresh goroutine doesn't read the done channel yet. In this case, the updated
// metrics object will be nil. And the refresher will soon be stopped.
// 2. The FetchMetrics call can partially fail. For example, due to one metric missing. In
// this case, the updated metrics object will have partial updates. A partial update is
// considered better than no updates.
if updated != nil {
updated.UpdateTime = time.Now()
pm.logger.V(logutil.TRACE).Info("Refreshed metrics", "updated", updated)
atomic.StorePointer(&pm.metrics, unsafe.Pointer(updated))
}

atomic.StorePointer(&pm.metrics, unsafe.Pointer(updated))
return nil
}

Expand Down