Met segment fault while running Whisper on Arc #13001

Ruoyu-y · 2025-03-25T01:30:58Z

Configuration:

OS: Ubuntu 24.04 
CPU: 12th Gen Intel(R) Core(TM) i9-12900K
Memory: 16G
GPU:  04:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08)
software:
    torch                                    2.1.0a0+cxx11.abi
    intel-extension-for-pytorch           2.1.10+xpu
    ipex-llm                                  2.2.0b20250322
    bigdl-core-xe-21                    2.6.0b20250322

Issue met:
run whisper with command python ./recognize.py and get segment fault error

Logs:

$ python recognize.py
/home/cloud/ruoyu/miniforge3/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/home/cloud/ruoyu/miniforge3/envs/llm/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2025-03-25 09:18:43,572 - INFO - intel_extension_for_pytorch auto imported
2025-03-25 09:18:43,855 - INFO - PyTorch version 2.1.0a0+cxx11.abi available.
step1:
/home/cloud/ruoyu/miniforge3/envs/llm/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
2025-03-25 09:18:46,419 - INFO - Converting the current model to sym_int4 format......

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: adl [12th Gen Intel(R) Core(TM) i9-12900K]
Registry and code: 13 MB
Command: python recognize.py
Uptime: 3.432546 s
Segmentation fault

The text was updated successfully, but these errors were encountered:

Ruoyu-y · 2025-03-25T13:09:26Z

Any hint for this issue? Or recommended configuration?

hkvision · 2025-03-26T01:25:02Z

Hi,

May I ask if this segment fault only exists for whisper or it also exists in running other models https://github.com/intel/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM ?

Also, you may use our script to check the environment so that we can better help to detect the issue: https://github.com/intel/ipex-llm/tree/main/python/llm/scripts#usage

Ruoyu-y · 2025-03-26T03:10:53Z

Hi,

May I ask if this segment fault only exists for whisper or it also exists in running other models https://github.com/intel/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM ?

Also, you may use our script to check the environment so that we can better help to detect the issue: https://github.com/intel/ipex-llm/tree/main/python/llm/scripts#usage

I found other LLMs also returns segment fault error. But it works with docker container. Here's the output of that environment check script:

$ bash env-check.sh
-----------------------------------------------------------------
PYTHON_VERSION=3.11.11
-----------------------------------------------------------------
transformers=4.36.2
-----------------------------------------------------------------
torch=2.1.0a0+cxx11.abi
-----------------------------------------------------------------
ipex-llm Version: 2.2.0b20250322
-----------------------------------------------------------------
ipex=2.1.10+xpu
-----------------------------------------------------------------
CPU Information:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               24
On-line CPU(s) list:                  0-23
Vendor ID:                            GenuineIntel
Model name:                           12th Gen Intel(R) Core(TM) i9-12900K
CPU family:                           6
Model:                                151
Thread(s) per core:                   2
Core(s) per socket:                   16
Socket(s):                            1
Stepping:                             2
CPU(s) scaling MHz:                   22%
CPU max MHz:                          5200.0000
CPU min MHz:                          800.0000
-----------------------------------------------------------------
Total CPU Memory: 15.3286 GB
Memory Type: DDR5
-----------------------------------------------------------------
Operating System:
Ubuntu 24.04 LTS \n \l

-----------------------------------------------------------------
Linux cloudgpu 6.8.0-52-generic #53-Ubuntu SMP PREEMPT_DYNAMIC Sat Jan 11 00:06:25 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
-----------------------------------------------------------------
CLI:
    Version: 1.2.39.20240906
    Build ID: 11f3c29a

Service:
    Version: 1.2.39.20240906
    Build ID: 11f3c29a
    Level Zero Version: 1.17.0
-----------------------------------------------------------------
  Driver Version                                  2023.16.12.0.12_195853.xmain-hotfix
  Driver Version                                  2023.16.12.0.12_195853.xmain-hotfix
-----------------------------------------------------------------
Driver related package version:
ii  intel-fw-gpu                                     2024.17.5-329~22.04                      all          Firmware package for Intel integrated and discrete GPUs
ii  intel-level-zero-gpu                             1.3.29735.27-914~22.04                   amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  intel-level-zero-gpu-raytracing                  1.0.0-60~u22.04                          amd64        Level Zero Ray Tracing Support library
-----------------------------------------------------------------
igpu not detected
-----------------------------------------------------------------
xpu-smi is properly installed.
-----------------------------------------------------------------
No device discovered
GPU0 Memory ize=256M
-----------------------------------------------------------------
04:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
        Subsystem: Shenzhen Gunnir Technology Development Co., Ltd DG2 [Arc A770]
        Flags: bus master, fast devsel, latency 0, IRQ 234, IOMMU group 20
        Memory at 86000000 (64-bit, non-prefetchable) [size=16M]
        Memory at 4050000000 (64-bit, prefetchable) [size=256M]
        Expansion ROM at 87000000 [disabled] [size=2M]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915, xe
-----------------------------------------------------------------

Is there anything wrong in the configuration?

Ruoyu-y · 2025-03-26T07:14:07Z

To provide more details, on the same machine, i could run the inference service in docker according to the guide https://github.com/intel/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md. But i cannot run the whisper or other LLMs under python/llm/example/GPU/HuggingFace/LLM folder on my host. I also tried to run the whisper python file inside the docker container bring up following the previous guide, it failed as well. Please help to take a look @hkvision, thanks a lot!

hkvision · 2025-03-26T08:11:50Z

Hi, we checked your env, the following part might have issues.

-----------------------------------------------------------------
No device discovered
GPU0 Memory ize=256M

Could you use sycl-ls and xpu-smi discovery to confirm if the Arc device is properly detected? Thanks!

Ruoyu-y · 2025-03-27T02:11:26Z

xpu-smi discovery

xpu-smi discovery returns No device discovered. But i could found the Arc card using lspci. As i am using the in-tree driver in the ubuntu 24.04, will that cause the issue? @hkvision

hkvision · 2025-03-27T07:06:48Z

From your lspci result below, seems the memory 256M is not correct, should be 16G? Maybe can you check if your card is settled properly (e.g. resize bar)? Also is the result of sycl-ls as expected on your machine?

04:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
        Subsystem: Shenzhen Gunnir Technology Development Co., Ltd DG2 [Arc A770]
        Flags: bus master, fast devsel, latency 0, IRQ 234, IOMMU group 20
        Memory at 86000000 (64-bit, non-prefetchable) [size=16M]
        Memory at 4050000000 (64-bit, prefetchable) [size=256M]
        Expansion ROM at 87000000 [disabled] [size=2M]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915, xe

Ruoyu-y · 2025-03-28T00:51:56Z

From your lspci result below, seems the memory 256M is not correct, should be 16G? Maybe can you check if your card is settled properly (e.g. resize bar)? Also is the result of sycl-ls as expected on your machine?

04:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
        Subsystem: Shenzhen Gunnir Technology Development Co., Ltd DG2 [Arc A770]
        Flags: bus master, fast devsel, latency 0, IRQ 234, IOMMU group 20
        Memory at 86000000 (64-bit, non-prefetchable) [size=16M]
        Memory at 4050000000 (64-bit, prefetchable) [size=256M]
        Expansion ROM at 87000000 [disabled] [size=2M]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915, xe

I found no Arc shown in the sycl-ls result. How shall i fix the issue? I used to run ipex-llm inside docker and in that way, i could find Arc using sycl-ls

hkvision · 2025-03-28T02:04:19Z

We suppose this is not an ipex-llm issue but probably due to driver related packages.
You may refer to https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop-24-04-lts for the driver guide.
The environment of the docker (Ubuntu 22.04) is here: https://github.com/intel/ipex-llm/blob/main/docker/llm/serving/xpu/docker/Dockerfile

Ruoyu-y · 2025-04-03T01:58:09Z

We suppose this is not an ipex-llm issue but probably due to driver related packages. You may refer to https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop-24-04-lts for the driver guide. The environment of the docker (Ubuntu 22.04) is here: https://github.com/intel/ipex-llm/blob/main/docker/llm/serving/xpu/docker/Dockerfile

I follow the guide that you provided to install the driver again. Using the command 'clinfo | grep "770"' that provided at the end of the tutorial, i could see the device shown. Then i tried to install other dependencies according to the doc https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/install_linux_gpu.md#install-oneapi, everything seems fine. Then at last, i still met that segment fault. Any other suggestions?

Ruoyu-y · 2025-04-03T02:04:24Z

Or is there any example to run whisper in a docker container?

Ruoyu-y · 2025-04-03T08:49:57Z

It would be better if i could run it in docker. Here's the error message i got when running the example inside a docker container:

After installing the dependency with pip install trl, i got another error:

Ruoyu-y · 2025-04-04T01:07:16Z

Thanks for the guidance. Issue has been resolved

hkvision · 2025-04-07T01:07:28Z

Synced offline, pip install trl==0.11.0 solves the problem.
Feel free to tell us if there are further issues later :)

jason-dai · 2025-04-07T01:29:39Z

Synced offline, pip install trl==0.11.0 solves the problem. Feel free to tell us if there are further issues later :)

Shall we update the example readme?

hkvision · 2025-04-07T01:37:14Z

Synced offline, pip install trl==0.11.0 solves the problem. Feel free to tell us if there are further issues later :)

Shall we update the example readme?

Sure :)

qiuxin2012 added the user issue label Mar 25, 2025

Ruoyu-y closed this as completed Apr 4, 2025

hkvision mentioned this issue Apr 7, 2025

Add trl version in error message #13049

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Met segment fault while running Whisper on Arc #13001

Met segment fault while running Whisper on Arc #13001

Ruoyu-y commented Mar 25, 2025 •

edited

Loading

Ruoyu-y commented Mar 25, 2025

hkvision commented Mar 26, 2025

Ruoyu-y commented Mar 26, 2025 •

edited

Loading

Ruoyu-y commented Mar 26, 2025

hkvision commented Mar 26, 2025

Ruoyu-y commented Mar 27, 2025 •

edited

Loading

hkvision commented Mar 27, 2025 •

edited

Loading

Ruoyu-y commented Mar 28, 2025

hkvision commented Mar 28, 2025

Ruoyu-y commented Apr 3, 2025 •

edited

Loading

Ruoyu-y commented Apr 3, 2025

Ruoyu-y commented Apr 3, 2025

Ruoyu-y commented Apr 4, 2025

hkvision commented Apr 7, 2025

jason-dai commented Apr 7, 2025

hkvision commented Apr 7, 2025

Met segment fault while running Whisper on Arc #13001

Met segment fault while running Whisper on Arc #13001

Comments

Ruoyu-y commented Mar 25, 2025 • edited Loading

Ruoyu-y commented Mar 25, 2025

hkvision commented Mar 26, 2025

Ruoyu-y commented Mar 26, 2025 • edited Loading

Ruoyu-y commented Mar 26, 2025

hkvision commented Mar 26, 2025

Ruoyu-y commented Mar 27, 2025 • edited Loading

hkvision commented Mar 27, 2025 • edited Loading

Ruoyu-y commented Mar 28, 2025

hkvision commented Mar 28, 2025

Ruoyu-y commented Apr 3, 2025 • edited Loading

Ruoyu-y commented Apr 3, 2025

Ruoyu-y commented Apr 3, 2025

Ruoyu-y commented Apr 4, 2025

hkvision commented Apr 7, 2025

jason-dai commented Apr 7, 2025

hkvision commented Apr 7, 2025

Ruoyu-y commented Mar 25, 2025 •

edited

Loading

Ruoyu-y commented Mar 26, 2025 •

edited

Loading

Ruoyu-y commented Mar 27, 2025 •

edited

Loading

hkvision commented Mar 27, 2025 •

edited

Loading

Ruoyu-y commented Apr 3, 2025 •

edited

Loading