Error: flag provided but not defined: -ngl #13115

aarononeal · 2025-04-27T09:53:01Z

Describe the bug
Ollama runner fails with exit code 2 because -ngl is not a parameter.

How to reproduce
Steps to reproduce the error:

curl http://localhost:11434/api/generate -d '
{
   "model": "gemma3:4b",
   "prompt": "What is AI?",
   "stream": false
}'

Screenshots

time=2025-04-27T17:34:25.745+08:00 level=INFO source=server.go:426 msg="starting llama server" cmd="/usr/local/lib/python3.11/dist-packages/bigdl/cpp/libs/ollama/ollama-lib runner --ollama-engine --model /root/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 2048 --batch-size 512 -ngl 999 --threads 18 --no-mmap --parallel 1 --port 34611"
time=2025-04-27T17:34:25.746+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-04-27T17:34:25.746+08:00 level=INFO source=server.go:601 msg="waiting for llama runner to start responding"
time=2025-04-27T17:34:25.746+08:00 level=INFO source=server.go:635 msg="waiting for server to become available" status="llm server error"
flag provided but not defined: -ngl
Runner usage
...
   -n-gpu-layers int
    	Number of layers to offload to GPU

Environment information
If possible, please attach the output of the environment check script, using:

-----------------------------------------------------------------
PYTHON_VERSION=3.11.12
-----------------------------------------------------------------
/usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
transformers=4.36.2
-----------------------------------------------------------------
torch=2.2.0+cu121
-----------------------------------------------------------------
ipex-llm Version: 2.3.0b20250426
-----------------------------------------------------------------
IPEX is not installed. 
-----------------------------------------------------------------
CPU Information: 
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               18
On-line CPU(s) list:                  0-17
Vendor ID:                            GenuineIntel
Model name:                           13th Gen Intel(R) Core(TM) i7-1370P
CPU family:                           6
Model:                                186
Thread(s) per core:                   1
Core(s) per socket:                   18
Socket(s):                            1
Stepping:                             2
BogoMIPS:                             4377.60
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni arat vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization:                       VT-x
-----------------------------------------------------------------
Total CPU Memory: 17.529 GB
Memory Type: sudo: dmidecode: command not found
-----------------------------------------------------------------
Operating System: 
Ubuntu 22.04.5 LTS \n \l

-----------------------------------------------------------------
Linux ollama-c98c7b486-5j6w5 6.9.10+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.9.10-1~bpo12+1 (2024-07-26) x86_64 x86_64 x86_64 GNU/Linux
-----------------------------------------------------------------
./env-check.sh: line 148: xpu-smi: command not found
-----------------------------------------------------------------
  Driver Version                                  2024.18.12.0.05_160000
  Driver UUID                                     32342e35-322e-3332-3232-342e35000000
  Driver Version                                  24.52.32224.5
-----------------------------------------------------------------
Driver related package version:
ii  intel-level-zero-gpu                             1.6.32224.5                             amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  intel-level-zero-gpu-legacy1                     1.3.30872.22                            amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  level-zero-devel                                 1.20.2                                  amd64        oneAPI Level Zero
-----------------------------------------------------------------
igpu not detected
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Iris(R) Xe Graphics 12.3.0 [1.6.32224.500000]
[opencl:cpu][opencl:0] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-1370P OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO  [24.52.32224.5]
-----------------------------------------------------------------
xpu-smi is not installed. Please install xpu-smi according to README.md

Additional context
Note that runner fails because -ngl is passed when the parameter allowed is -n-gpu-layers.

The text was updated successfully, but these errors were encountered:

sgwhat · 2025-04-28T02:10:43Z

Hi @aarononeal, which version of ipex-llm ollama are you running?

kirel · 2025-04-28T12:33:54Z

I see the same when I build the latest xpu-ccp container according to the guide.

aarononeal · 2025-04-28T19:09:45Z

@sgwhat I followed the Docker quickstart using image intelanalytics/ipex-llm-inference-cpp-xpu:latest which for some reason is reporting Ollama version 0.0.0.

kirel · 2025-04-28T19:59:07Z

Can also confirm that.

Apokalypzx · 2025-04-29T08:46:12Z

Confirmed

/llm/ollama/./ollama --version
version=0.0.0

Gemma3:4b (Q4_0 and QA along with Gemma3:1b) result in error about "-ngl provided but not defined."

Also tested models Mistral:7b, llama3.1:8b. Both load and inference on the GPU fine without error.

Environment: Docker container built and run exactly as instructed in ipex-llm/docker/llm/inference-cpp. I exported ONEAPI_DEVICE_SELECTOR=level_zero:0 (my GPU) along with OLLAMA_HOST=0.0.0.0 prior to running /llm/scripts/start-ollama.sh. executing /llm/ollama/ollama downloads, loads, and inferences mistral:7b perfectly fine and is visible to open-webui.

All attempts at running Gemma3 fail with same error as OP. I even downloaded the raw .gguf from HF, made my own Modelfile and created the model that way, same error when running the model.

stormsteve · 2025-04-29T18:08:31Z

Happens for me too.

I just installed ollama-ipex-llm-2.3.0b20250428-win.zip

C:>ollama run gemma3
Error: llama runner process has terminated: exit status 2

The log shows:
[...]
time=2025-04-29T21:05:57.626+03:00 level=INFO source=server.go:426 msg="starting llama server" cmd="C:\Users\steve\ollama\ollama-lib.exe runner --ollama-engine --model C:\Users\steve\.ollama\models\blobs\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 16384 --batch-size 512 -ngl 999 --threads 6 --no-mmap --parallel 4 --port 61002"
time=2025-04-29T21:05:57.640+03:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-04-29T21:05:57.644+03:00 level=INFO source=server.go:601 msg="waiting for llama runner to start responding"
time=2025-04-29T21:05:57.645+03:00 level=INFO source=server.go:635 msg="waiting for server to become available" status="llm server error"
flag provided but not defined: -ngl
Runner usage
[...]

But gemma2 works without an issue.

nebulakid · 2025-04-29T20:14:05Z

Same for me, but now for all models. gemma3 starts with:

cmd="C:\\temp\\ollama_nightly\\ollama-lib.exe runner --ollama-engine --model W:\\llm\\blobs\\sha256-3d3a5470ffe8a3b9d5dff58f3be2c87c184b53f0b7fe25887328d17e8fd32eb2 --ctx-size 8192 --batch-size 512 -ngl 999 --threads 4 --no-mmap --parallel 4 --port 32671"

qwen3:

cmd="C:\\temp\\ollama_nightly\\ollama-lib.exe runner --model W:\\llm\\blobs\\sha256-a3de86cd1c132c822487ededd47a324c50491393e6565cd14bafa40d0b8e686f --ctx-size 8192 --batch-size 512 -ngl 999 --threads 4 --no-mmap --parallel 4 --port 32695"

the difference is --ollama-engine

Version: ollama-ipex-llm-2.3.0b20250428-win.zip

aarononeal · 2025-04-29T23:16:28Z

Ah. Maybe just not supported yet? #12963

sgwhat · 2025-04-30T08:23:44Z

Hi guys, this version and -ngl issues have been fixed today, you may install the latest version via pip install --pre --upgrade ipex-llm[cpp] tmr.

aarononeal · 2025-04-30T19:26:43Z

@sgwhat, thank you for your efforts here!

I just tried updating and I still get the -ngl error, but I noticed the package installed was 2.3.0b20250428. Is that the correct version with the fix?

Ah, nevermind, you said tomorrow! I will check back then. Thanks again!

aarononeal · 2025-05-01T22:46:07Z

So I'm happy to report that the -ngl issue appears to be resolved. However, I'm still not seeing the model load.

These repeat in the logs for a while:

time=2025-05-02T06:39:57.571+08:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T06:39:59.454+08:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server not responding"

Before a final timeout:

time=2025-05-02T06:40:13.912+08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="timed out waiting for llama runner to start - progress 0.00 - "

Other entries related to runner and model load:

time=2025-05-02T06:35:08.589+08:00 level=INFO source=server.go:430 msg="starting llama server" cmd="/usr/local/lib/python3.11/dist-packages/bigdl/cpp/libs/ollama/ollama-lib runner --ollama-engine --model /root/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 2048 --batch-size 512 --n-gpu-layers 999 --threads 18 --no-mmap --parallel 1 --port 40013"
time=2025-05-02T06:35:08.590+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T06:35:08.591+08:00 level=INFO source=server.go:605 msg="waiting for llama runner to start responding"
time=2025-05-02T06:35:08.592+08:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T06:35:08.775+08:00 level=INFO source=runner.go:757 msg="starting ollama engine"
time=2025-05-02T06:35:08.779+08:00 level=INFO source=runner.go:817 msg="Server listening on 127.0.0.1:40013"
time=2025-05-02T06:35:08.846+08:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T06:35:08.863+08:00 level=INFO source=ggml.go:68 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
time=2025-05-02T06:35:09.920+08:00 level=INFO source=ggml.go:296 msg="Number of model weight buffers" count=2
time=2025-05-02T06:35:09.920+08:00 level=INFO source=ggml.go:299 msg="model weights" buffer=CPU size="525.0 MiB"
time=2025-05-02T06:35:09.920+08:00 level=INFO source=ggml.go:299 msg="model weights" buffer=SYCL0 size="3.1 GiB"

aarononeal · 2025-05-01T23:20:40Z

A system restart resolved the above timeouts and the model appears to load. However, now stopped at this issue.

time=2025-05-02T07:18:23.065+08:00 level=INFO source=server.go:644 msg="llama runner started in 5.72 seconds"
panic: failed to sample token: no tokens to sample from

goroutine 11 [running]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000132120, {0x1569620, 0xc000691540})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:335 +0x65
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:794 +0xa9c

nebulakid · 2025-05-02T11:17:00Z

@sgwhat: do you know how often the windows nightly builds of ollama are being build?

jason-dai · 2025-05-02T12:25:39Z

@sgwhat: do you know how often the windows nightly builds of ollama are being build?

For the latest nightly, please refer to https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md

aarononeal · 2025-05-03T07:25:25Z

Although I still can’t get gemma3 working due to the token problem, the specific issue I reported is addressed, so closing this.

aarononeal added the user issue label Apr 27, 2025

aarononeal closed this as completed May 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error: flag provided but not defined: -ngl #13115

Error: flag provided but not defined: -ngl #13115

aarononeal commented Apr 27, 2025 •

edited

Loading

sgwhat commented Apr 28, 2025

Uh oh!

kirel commented Apr 28, 2025

Uh oh!

aarononeal commented Apr 28, 2025

Uh oh!

kirel commented Apr 28, 2025

Uh oh!

Apokalypzx commented Apr 29, 2025 •

edited

Loading

Uh oh!

stormsteve commented Apr 29, 2025

Uh oh!

nebulakid commented Apr 29, 2025 •

edited

Loading

Uh oh!

aarononeal commented Apr 29, 2025

Uh oh!

sgwhat commented Apr 30, 2025

Uh oh!

aarononeal commented Apr 30, 2025 •

edited

Loading

Uh oh!

aarononeal commented May 1, 2025

Uh oh!

aarononeal commented May 1, 2025 •

edited

Loading

Uh oh!

nebulakid commented May 2, 2025 •

edited

Loading

Uh oh!

jason-dai commented May 2, 2025

Uh oh!

aarononeal commented May 3, 2025

Uh oh!

Error: flag provided but not defined: -ngl #13115

Error: flag provided but not defined: -ngl #13115

Comments

aarononeal commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

sgwhat commented Apr 28, 2025

Uh oh!

kirel commented Apr 28, 2025

Uh oh!

aarononeal commented Apr 28, 2025

Uh oh!

kirel commented Apr 28, 2025

Uh oh!

Apokalypzx commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stormsteve commented Apr 29, 2025

Uh oh!

nebulakid commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aarononeal commented Apr 29, 2025

Uh oh!

sgwhat commented Apr 30, 2025

Uh oh!

aarononeal commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aarononeal commented May 1, 2025

Uh oh!

aarononeal commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nebulakid commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jason-dai commented May 2, 2025

Uh oh!

aarononeal commented May 3, 2025

Uh oh!

aarononeal commented Apr 27, 2025 •

edited

Loading

Apokalypzx commented Apr 29, 2025 •

edited

Loading

nebulakid commented Apr 29, 2025 •

edited

Loading

aarononeal commented Apr 30, 2025 •

edited

Loading

aarononeal commented May 1, 2025 •

edited

Loading

nebulakid commented May 2, 2025 •

edited

Loading