Skip to content

Commit 42bb201

Browse files
authored
[V1][Minor] Set pin_memory=False for token_ids_cpu tensor (#11581)
Signed-off-by: Woosuk Kwon <[email protected]>
1 parent 59d6bb4 commit 42bb201

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm/v1/worker/gpu_input_batch.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,11 +57,13 @@ def __init__(
5757

5858
# TODO(woosuk): This buffer could be too large if max_model_len is big.
5959
# Find a way to reduce the CPU memory usage.
60+
# This buffer is not directly transferred to the GPU, so it does not
61+
# need to be pinned.
6062
self.token_ids_cpu_tensor = torch.zeros(
6163
(max_num_reqs, max_model_len),
6264
device="cpu",
6365
dtype=torch.int32,
64-
pin_memory=pin_memory,
66+
pin_memory=False,
6567
)
6668
self.token_ids_cpu = self.token_ids_cpu_tensor.numpy()
6769
self.num_computed_tokens_cpu = np.empty(max_num_reqs, dtype=np.int32)

0 commit comments

Comments
 (0)