Improve vLLM upstream health checks to only pass when models are servable #558

smarterclayton · 2025-03-21T14:57:33Z

As documented in #550, the default vLLM configuration could be improved and documented better. A startupProbe on /health is the right default for vLLM given it does not load the server until a very long model load is complete, but tunables may vary.

smarterclayton mentioned this issue Mar 21, 2025

Configure the vllm deployment with best practices for startup #550

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve vLLM upstream health checks to only pass when models are servable #558

Improve vLLM upstream health checks to only pass when models are servable #558

smarterclayton commented Mar 21, 2025 •

edited

Loading

Improve vLLM upstream health checks to only pass when models are servable #558

Improve vLLM upstream health checks to only pass when models are servable #558

Comments

smarterclayton commented Mar 21, 2025 • edited Loading

smarterclayton commented Mar 21, 2025 •

edited

Loading