Skip to content

Commit 05af6da

Browse files
[ROCm] enable cupy in order to enable cudagraph mode for AMD GPUs (#3123)
Co-authored-by: lcskrishna <[email protected]>
1 parent 9a4548b commit 05af6da

File tree

2 files changed

+26
-8
lines changed

2 files changed

+26
-8
lines changed

Dockerfile.rocm

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ RUN echo "FA_BRANCH is $FA_BRANCH"
2323
# In that case, we need to use the python reference attention implementation in vllm
2424
ARG BUILD_FA="1"
2525

26+
# whether to build cupy on rocm
27+
ARG BUILD_CUPY="1"
28+
2629
# Install some basic utilities
2730
RUN apt-get update && apt-get install python3 python3-pip -y
2831

@@ -70,16 +73,33 @@ RUN if [ "$BUILD_FA" = "1" ]; then \
7073
&& cd ..; \
7174
fi
7275

73-
COPY ./ /app/vllm
74-
75-
RUN python3 -m pip install --upgrade pip
76-
RUN python3 -m pip install xformers==0.0.23 --no-deps
77-
7876
# Error related to odd state for numpy 1.20.3 where there is no METADATA etc, but an extra LICENSES_bundled.txt.
7977
# Manually removed it so that later steps of numpy upgrade can continue
8078
RUN if [ "$BASE_IMAGE" = "rocm/pytorch:rocm6.0_ubuntu20.04_py3.9_pytorch_2.1.1" ]; then \
8179
rm -rf /opt/conda/envs/py_3.9/lib/python3.9/site-packages/numpy-1.20.3.dist-info/; fi
8280

81+
# build cupy
82+
RUN if [ "$BUILD_CUPY" = "1" ]; then \
83+
mkdir -p libs \
84+
&& cd libs \
85+
&& git clone -b hipgraph_enablement --recursive https://github.com/ROCm/cupy.git \
86+
&& cd cupy \
87+
&& pip install mpi4py-mpich \
88+
&& pip install scipy==1.9.3 \
89+
&& pip install cython==0.29.* \
90+
&& env CC=$MPI_HOME/bin/mpicc python -m pip install mpi4py \
91+
&& export CUPY_INSTALL_USE_HIP=1 \
92+
&& export ROCM_HOME=/opt/rocm \
93+
&& export HCC_AMDGPU_TARGET="gfx90a,gfx942,gfx1100" \
94+
&& pip install . \
95+
&& cd ..; \
96+
fi
97+
98+
COPY ./ /app/vllm
99+
100+
RUN python3 -m pip install --upgrade pip
101+
RUN python3 -m pip install xformers==0.0.23 --no-deps
102+
83103
RUN cd /app \
84104
&& cd vllm \
85105
&& pip install -U -r requirements-rocm.txt \

vllm/worker/worker.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919
from vllm.worker.cache_engine import CacheEngine
2020
from vllm.worker.model_runner import ModelRunner
2121
from vllm.lora.request import LoRARequest
22-
from vllm.utils import is_hip
2322

2423

2524
class Worker:
@@ -267,8 +266,7 @@ def init_distributed_environment(
267266
"cupy.distributed is already initialized but the cupy world "
268267
"size does not match parallel_config.world_size "
269268
f"({cupy_world_size} vs. {parallel_config.world_size}).")
270-
elif (parallel_config.world_size > 1 and cupy_port is not None
271-
and not is_hip()):
269+
elif (parallel_config.world_size > 1 and cupy_port is not None):
272270
# NOTE(woosuk): We don't initialize CuPy process group when world size
273271
# is 1.
274272
# TODO(woosuk): Support multi-node connection.

0 commit comments

Comments
 (0)