Skip to content

ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory #1369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DoliteMatheo opened this issue Oct 16, 2023 · 22 comments
Labels
installation Installation problems

Comments

@DoliteMatheo
Copy link

DoliteMatheo commented Oct 16, 2023

When I used vllm to serve my local model, the terminal displayed the following message:
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory
The traceback pointed to the following code in site-packages/vllm/utils.py and the execution of the single line could also trigger the same error:

"from vllm import cuda_utils"

I suppose it may be caused by the mismatch between vllm and my CUDA version or Pytorch version. The CUDA version is 12.2 (only this version installed) on my machine and installing a new version 11 is not so convenient, the Pytorch version is 2.1.0, vllm version is 0.2.0
How could I solve the problem without re-install CUDA 11?
Many thanks!

@alan1989
Copy link

I encountered the same problem, did you solve it?

@DoliteMatheo
Copy link
Author

I encountered the same problem, did you solve it?

Unfortunately, I didn't find a better solution than installing CUDA 11, but I don't want to make any change about the CUDA version since the machine is not my private, and re-installing CUDA often cause many more unexpected problems. If you have got any solution, please tell me, much appreciated.

@bhupendrathore
Copy link

bhupendrathore commented Oct 16, 2023

I tried with cuda 12.2. I get the same error, when trying with cuda 11.7 getting the following error :
RuntimeError: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver.

i updated my xformers with pip install xformers==v0.0.22 and works fine.

i am using cuda11.7 docker image.

@alan1989
Copy link

i have sloved it. first find the libcudart.so.11.0 path on your disk.then write it into LD_LIBRARY_PATH

locate libcudart.so.11.0
export LD_LIBRARY_PATH=(the libcudart.so.11.0 path you find):$LD_LIBRARY_PATH

@vemonet
Copy link

vemonet commented Oct 16, 2023

We are getting the same "error", but with CUDA 12.1

I am not sure who's fault is it, but throwing an error for the reason that "we cannot find a file we installed ourselves so we crash everything" is a bit ridiculous .

I did not see any restriction against using CUDA 12 in vLLM docs. So we can expect vLLM works for the latest CUDA version

Here is the code to reproduce (cf. below to see in which docker image to run this)

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.llms import VLLM

llm = VLLM(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    max_new_tokens=8000,
    top_k=10,
    top_p=0.95,
    temperature=0.8,
)

conversation = ConversationChain(
    llm=llm, verbose=True, memory=ConversationBufferMemory()
)

print(conversation.predict(input="Hi mom!"))

Here is the full error we get:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/langchain/llms/vllm.py", line 79, in validate_environment
    from vllm import LLM as VLLModel
  File "/usr/local/lib/python3.10/dist-packages/vllm/__init__.py", line 3, in <module>
    from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/arg_utils.py", line 6, in <module>
    from vllm.config import (CacheConfig, ModelConfig, ParallelConfig,
  File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 8, in <module>
    from vllm.utils import get_cpu_memory
  File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 8, in <module>
    from vllm import cuda_utils
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/share/code-llama/run_vllm.py", line 5, in <module>
    llm = VLLM(
  File "/usr/local/lib/python3.10/dist-packages/langchain/load/serializable.py", line 97, in __init__
    super().__init__(**kwargs)
  File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1102, in pydantic.main.validate_model
  File "/usr/local/lib/python3.10/dist-packages/langchain/llms/vllm.py", line 81, in validate_environment
    raise ImportError(
ImportError: Could not import vllm python package. Please install it with `pip install vllm`.

1. We use official nvidia images

Here is our setup: We are literally using the official CUDA image from nvidia: nvcr.io/nvidia/cuda:12.1.0-devel-ubuntu22.04

Starting from that the message "can't find libcudart" has no reason to exist

2. We made sure to have the right CUDA version

We don't use the pytorch one like recommended by vllm docs because with the pytorch one we can't control exactly which CUDA version gets installed, and then we get error like "muuuuh pytorch was compiled with a different CUDA version" . Also pytorch GPU image is like ~9G vs ~3G for CUDA

nvidia-smi shows we are using CUDA 12.1:

| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |

pip list | grep cuda shows we have the CUDA version 12.1 installed everywhere:

nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105

3. It was working on CUDA 12.1 last week 🫠

We managed to make it work last week when running in a old pytorch docker image that was still on py 3.8. But now it is broken when running on up-to-date images (what a mess), always complaining about this non-existing error with libcudart location

And last week when it was working the main GPU was still on CUDA 12.1 (according to nvidia-smi), but with some old 11.7 pip packages installed (as I say it's an old pytorch image running py3.8, what a wonderful mess, but it seems like vLLM is thriving in the mess since it's the only time it worked!):

cuda-python                   12.1.0rc5+1.gc7fd38c.dirty
cupy-cuda12x                  12.0.0b3
dask-cuda                     23.2.0
nvidia-cuda-cupti-cu11        11.7.101
nvidia-cuda-nvrtc-cu11        11.7.99
nvidia-cuda-runtime-cu11      11.7.99
nvidia-dali-cuda110           1.23.0

Meaning vLLM can work on CUDA 12 drivers, and we don't need to reinstall CUDA 11, only some CUDA 11 runtime libs should be good: nvidia-cuda-runtime-cu11, or nvidia-cuda-nvrtc-cu11, or nvidia-cuda-cupti-cu11

4. Trying the locate libcudart fix

Running locate libcudart.so.11.0 does not find anything inside the CUDA docker image. Because we have CUDA 12 installed I guess

5. Conclusion

CUDA version is really sensible for vLLM to work. It would be really helpful for vLLM to provide a bit of documentation around it, e.g.:

  • vLLM only works on CUDA 11 by default
  • Steps to make vLLM work on CUDA 12
  • Ideally, stop recommending to use the Pytorch docker image, it is really hard to find out which exact CUDA version the image is using. It is much more reliable to start from the CUDA image, and then add pytorch (pip install is as good).

@vemonet
Copy link

vemonet commented Oct 16, 2023

A potential approach to fix it: it could be due to the torch version which is fixed to 2.0.1 : https://github.com/vllm-project/vllm/blob/main/pyproject.toml#L6

Because torch 2.0.1 does not have a variant for CUDA 12 (only CUDA 11)
Maybe installing a new torch version should work

I'll try to re-build vllm without version limitations for torch to see if that helps

@copasseron
Copy link

A potential approach to fix it: it could be due to the torch version which is fixed to 2.0.1 : https://github.com/vllm-project/vllm/blob/main/pyproject.toml#L6

Because torch 2.0.1 does not have a variant for CUDA 12 (only CUDA 11) Maybe installing a new torch version should work

I'll try to re-build vllm without version limitations for torch to see if that helps

let me know if you've got any news here.

I've got the same problem since this morning, with nvidia image as well nvcr.io/nvidia/tritonserver:23.09-py3.

@vemonet
Copy link

vemonet commented Oct 16, 2023

It's weird because the latest vllm release actually uses torch >= 2.0.0, so I should be able to use torch 2.1.0 with vllm 0.2.0.

But installing vllm always installs torch 2.0.1, and it due to:

xformers 0.0.22 requires torch==2.0.1, but you have torch 2.1.0 which is incompatible.

If we try to pip install --upgrade xformers:

vllm 0.2.0 requires xformers==0.0.22, but you have xformers 0.0.22.post4 which is incompatible.

But the requirements.txt of release v0.2.0 indicates xformers >= 0.0.22

And whatever combination tried I am always gettings errors, most of the time this one:

INFO 10-16 16:23:57 llm_engine.py:72] Initializing an LLM engine with config: model='mistralai/Mistral-7B-Instruct-v0.1', tokenizer='mistralai/Mistral-7B-Instruct-v0.1', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/workspace/share/code-llama/run_vllm.py", line 5, in <module>
    llm = VLLM(
  File "/usr/local/lib/python3.10/dist-packages/langchain/load/serializable.py", line 97, in __init__
    super().__init__(**kwargs)
  File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1102, in pydantic.main.validate_model
  File "/usr/local/lib/python3.10/dist-packages/langchain/llms/vllm.py", line 86, in validate_environment
    values["client"] = VLLModel(
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 93, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 231, in from_engine_args
    engine = cls(*engine_configs,
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 110, in __init__
    self._init_workers(distributed_init_method)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 128, in _init_workers
    from vllm.worker.worker import Worker  # pylint: disable=import-outside-toplevel
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 10, in <module>
    from vllm.model_executor import get_model, InputMetadata, set_random_seed
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/__init__.py", line 2, in <module>
    from vllm.model_executor.model_loader import get_model
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 10, in <module>
    from vllm.model_executor.models import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/__init__.py", line 1, in <module>
    from vllm.model_executor.models.aquila import AquilaForCausalLM
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/aquila.py", line 35, in <module>
    from vllm.model_executor.layers.attention import PagedAttentionWithRoPE
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/attention.py", line 10, in <module>
    from vllm import attention_ops
ImportError: /usr/local/lib/python3.10/dist-packages/vllm/attention_ops.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl27throw_data_ptr_access_errorEv

@WoosukKwon WoosukKwon added the installation Installation problems label Oct 16, 2023
@WoosukKwon
Copy link
Collaborator

If I understand the problem correctly, the issue was because v0.2.0 didn't fix the pytorch and xformers version. In v0.2.1 which was released today, we pinned their versions. So the error should not happen as long as you use CUDA 11.8.

We will support CUDA 12 once xformers releases a new stable version with CUDA 12 support. (While xformers==0.0.22.post4 seems to include CUDA 12 binaries, I feel it's a bit unstable at the moment).

@gesanqiu
Copy link
Contributor

gesanqiu commented Oct 17, 2023

If I understand the problem correctly, the issue was because v0.2.0 didn't fix the pytorch and xformers version. In v0.2.1 which was released today, we pinned their versions. So the error should not happen as long as you use CUDA 11.8.

We will support CUDA 12 once xformers releases a new stable version with CUDA 12 support. (While xformers==0.0.22.post4 seems to include CUDA 12 binaries, I feel it's a bit unstable at the moment).

Right now Pytorch 2.0.1 is binded to CUDA 11.7, compile vLLM with CUDA 11.8 will get fialed, just the same issue like #1283
I use nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 install vLLM with pip install -e . succeed.

@DoliteMatheo
Copy link
Author

If I understand the problem correctly, the issue was because v0.2.0 didn't fix the pytorch and xformers version. In v0.2.1 which was released today, we pinned their versions. So the error should not happen as long as you use CUDA 11.8.

We will support CUDA 12 once xformers releases a new stable version with CUDA 12 support. (While xformers==0.0.22.post4 seems to include CUDA 12 binaries, I feel it's a bit unstable at the moment).

Finally I installed CUDA 11.7 manually and the problem got fixed immediately. It seemed that vllm cannot work if there is only CUDA 12 installed on the machine.

@s-natsubori
Copy link

I encountered the same problem,

  • docker base image : nvcr.io/nvidia/pytorch:22.12-py3
  • vllm==0.2.0

and I solved this error with
add requirements.txt
xformers==0.0.22

xformers==0.0.22 requires nvidia-cuda-runtime-cu11==11.7.99 and etc.
Unfortunately it uninstall PyTorch2.1.0(originally installed)
but my code is working!!

@bitsnaps
Copy link

bitsnaps commented Nov 8, 2023

I'm getting the same error on colab with TheBloke-Dolphin-2.1-mistral-7B-GPTQ which was working...

@sanjana-sudo
Copy link

I'm getting the same error on colab with TheBloke-Dolphin-2.1-mistral-7B-GPTQ which was working...

@bitsnaps same problem here. did you find any solution?

@bitsnaps
Copy link

bitsnaps commented Nov 9, 2023

I'm getting the same error on colab with TheBloke-Dolphin-2.1-mistral-7B-GPTQ which was working...

@bitsnaps same problem here. did you find any solution?

Not yet, I believe this is something to do with mistral/transformer/huggingface issue (not vllm), I'm not even able to run mistral-7b on colab which was working fine last week.

@sanjana-sudo
Copy link

sanjana-sudo commented Nov 16, 2023

I'm getting the same error on colab with TheBloke-Dolphin-2.1-mistral-7B-GPTQ which was working...

@bitsnaps same problem here. did you find any solution?

Not yet, I believe this is something to do with mistral/transformer/huggingface issue (not vllm), I'm not even able to run mistral-7b on colab which was working fine last week.

@bitsnaps I tried to run the Mistral_7B_Instruct_v0_1_GGUF now and its working. I just downgraded gradio to gradio==3.32.0 and did not change anything related to flash-attn.

@s-natsubori
Copy link

s-natsubori commented Nov 16, 2023

Currently, AutoAWQ deliver two versions. (cuda11 and cuda12)
I recommend that you try a combination of both to suit your environment.
and chek other pakeges too.

pip install autoawq (torch 2.1.0 + CUDA 12.1.1)

from github(torch20 + cuda11)
pip install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl

@D-Octopus
Copy link

D-Octopus commented Nov 28, 2023

Have a go at updating vllm to v0.2.2. Looks like they've sorted out this issue in that version.
v0.2.2 Major changes
Upgrade to CUDA 12 #1527

@LI-ZHAODONG
Copy link

I'm using llmware library and was facing the same error. I upgraded toch (2.0.1 -> 2.1.0) and solved the prob.

@ibnzahoor98
Copy link

pip install xformers==v0.0.22

Thank you! Worked like a charm!

@hmellor hmellor closed this as completed Apr 4, 2024
@Provemj
Copy link

Provemj commented Oct 17, 2024

It's probably just a cuda or torch version problem, try downgrading it

@HarikrishnanK9
Copy link

i have sloved it. first find the libcudart.so.11.0 path on your disk.then write it into LD_LIBRARY_PATH

locate libcudart.so.11.0 export LD_LIBRARY_PATH=(the libcudart.so.11.0 path you find):$LD_LIBRARY_PATH
Thank You @alan1989 It Worked for me. export LD_LIBRARY_PATH=/home/hari/anaconda3/envs/prod_env/lib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation Installation problems
Projects
None yet
Development

No branches or pull requests