Failed to detect NVIDIA driver version since vllm 0.2.0 #1268

chrislemke · 2023-10-05T13:08:16Z

Hey.
I recently updated from 0.1.3 to 0.2.0. Since then I have a problem.

The following Docker container is deployed through SageMaker:

FROM nvcr.io/nvidia/pytorch:22.12-py3
ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get -y update \
    && apt-get -y install gcc \
    && pip uninstall torch -y

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY .

vllm==0.2.0
Flask==2.3.2
gunicorn==20.1.0
sentence_transformers==2.2.2
accelerate==0.23.0
huggingface_hub==0.17.3
typing-inspect==0.9.0
typing_extensions==4.8.0

When creating the Inference service, CloudWatch then repeatedly spits out the following logs:

=============
== PyTorch ==
=============
NVIDIA Release 22.12 (build 49968248)
PyTorch Version 1.14.0a0+410ce96
Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2022 Facebook Inc.
...
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
ERROR: No supported GPU(s) detected to run this container
Failed to detect NVIDIA driver version.
Usage: serve [OPTIONS] COMMAND [ARGS]... CLI for managing Serve applications on a Ray cluster.
Options: -h, --help Show this message and exit.
Commands: build Generate a config file for the specified applications. config Gets the current config(s) of Serve application(s) on the... deploy Deploy Serve application(s) from a YAML config file. run Run Serve application(s). shutdown Shuts down Serve on the cluster, deleting all applications. start Start Serve on the Ray cluster. status Get the current status of all Serve applications on the cluster.

When I use exactly the same settings but vllm version 0.1.3. It works. Any idea? Probably I am just doing something terribly wrong ;-)

Thanks in advance!

The text was updated successfully, but these errors were encountered:

WoosukKwon · 2023-10-11T08:00:56Z

Hi @chrislemke I think this issue is due to the recent PyTorch v2.1.0 release and is solved by #1290. Could you try it again?

WoosukKwon closed this as completed Oct 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Failed to detect NVIDIA driver version since vllm 0.2.0 #1268

Failed to detect NVIDIA driver version since vllm 0.2.0 #1268

chrislemke commented Oct 5, 2023

WoosukKwon commented Oct 11, 2023

Uh oh!

Uh oh!

Failed to detect NVIDIA driver version since vllm 0.2.0 #1268

Failed to detect NVIDIA driver version since vllm 0.2.0 #1268

Comments

chrislemke commented Oct 5, 2023

WoosukKwon commented Oct 11, 2023

Uh oh!