Does vllm support CPU? #999

gokul427 · 2023-09-09T13:19:48Z

gokul427
Sep 9, 2023

Can we use vllm only on CPU without GPU machine?

mspronesti · 2023-09-30T22:06:54Z

mspronesti
Sep 30, 2023

I think the short answer is no, as vLLM's engine relies on custom kernels written in CUDA.

0 replies

BrightXiaoHan · 2023-10-27T02:26:52Z

BrightXiaoHan
Oct 27, 2023

You can try ctranslate2 or llama.cpp.

1 reply

VpkPrasanna Jan 3, 2024

obviously i know this library has been designed on top of cuda kernels for better GPU utilization , but the inference will pretty slow right on CPU? is there any workaround to load these quantized models on CPU to server faster ?
#2326

hughesadam87 · 2024-02-08T18:23:31Z

hughesadam87
Feb 8, 2024

This was not clear to me either - any way to highlight this in bold somewhere on main docs? Sorry if I overlooked.

I am trying to do some local testing - that's my use case

0 replies

zengqingfu1442 · 2025-02-14T10:15:40Z

zengqingfu1442
Feb 14, 2025

Intel CPU is supported from my own experience.

2 replies

theBeginner86 Apr 25, 2025

+1

theBeginner86 Apr 25, 2025

Reference: https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html

Uh oh!

Does vllm support CPU? #999

Uh oh!

Replies: 4 comments · 3 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 4 comments 3 replies