You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Latency: Low metric fetch latency is crucial as the load balancing algorithm relies on near real time metrics from the model servers. Therefore it's important to monitor the metric fetch latency.
Currently we simply log the latency, we should integrate Prometheus and add a Histogram metric.
Error count. This can help detect bad backends.
Other consideration: We may need to sample the metrics instead of recording every probing call, due to the high probing frequency.
The text was updated successfully, but these errors were encountered:
Currently we simply log the latency, we should integrate Prometheus and add a Histogram metric.
Other consideration: We may need to sample the metrics instead of recording every probing call, due to the high probing frequency.
The text was updated successfully, but these errors were encountered: