Skip to content

Error: flag provided but not defined: -ngl #13115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aarononeal opened this issue Apr 27, 2025 · 15 comments
Closed

Error: flag provided but not defined: -ngl #13115

aarononeal opened this issue Apr 27, 2025 · 15 comments

Comments

@aarononeal
Copy link

aarononeal commented Apr 27, 2025

Describe the bug
Ollama runner fails with exit code 2 because -ngl is not a parameter.

How to reproduce
Steps to reproduce the error:

curl http://localhost:11434/api/generate -d '
{
   "model": "gemma3:4b",
   "prompt": "What is AI?",
   "stream": false
}'

Screenshots

time=2025-04-27T17:34:25.745+08:00 level=INFO source=server.go:426 msg="starting llama server" cmd="/usr/local/lib/python3.11/dist-packages/bigdl/cpp/libs/ollama/ollama-lib runner --ollama-engine --model /root/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 2048 --batch-size 512 -ngl 999 --threads 18 --no-mmap --parallel 1 --port 34611"
time=2025-04-27T17:34:25.746+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-04-27T17:34:25.746+08:00 level=INFO source=server.go:601 msg="waiting for llama runner to start responding"
time=2025-04-27T17:34:25.746+08:00 level=INFO source=server.go:635 msg="waiting for server to become available" status="llm server error"
flag provided but not defined: -ngl
Runner usage
...
   -n-gpu-layers int
    	Number of layers to offload to GPU

Environment information
If possible, please attach the output of the environment check script, using:

-----------------------------------------------------------------
PYTHON_VERSION=3.11.12
-----------------------------------------------------------------
/usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
transformers=4.36.2
-----------------------------------------------------------------
torch=2.2.0+cu121
-----------------------------------------------------------------
ipex-llm Version: 2.3.0b20250426
-----------------------------------------------------------------
IPEX is not installed. 
-----------------------------------------------------------------
CPU Information: 
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               18
On-line CPU(s) list:                  0-17
Vendor ID:                            GenuineIntel
Model name:                           13th Gen Intel(R) Core(TM) i7-1370P
CPU family:                           6
Model:                                186
Thread(s) per core:                   1
Core(s) per socket:                   18
Socket(s):                            1
Stepping:                             2
BogoMIPS:                             4377.60
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni arat vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization:                       VT-x
-----------------------------------------------------------------
Total CPU Memory: 17.529 GB
Memory Type: sudo: dmidecode: command not found
-----------------------------------------------------------------
Operating System: 
Ubuntu 22.04.5 LTS \n \l

-----------------------------------------------------------------
Linux ollama-c98c7b486-5j6w5 6.9.10+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.9.10-1~bpo12+1 (2024-07-26) x86_64 x86_64 x86_64 GNU/Linux
-----------------------------------------------------------------
./env-check.sh: line 148: xpu-smi: command not found
-----------------------------------------------------------------
  Driver Version                                  2024.18.12.0.05_160000
  Driver UUID                                     32342e35-322e-3332-3232-342e35000000
  Driver Version                                  24.52.32224.5
-----------------------------------------------------------------
Driver related package version:
ii  intel-level-zero-gpu                             1.6.32224.5                             amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  intel-level-zero-gpu-legacy1                     1.3.30872.22                            amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  level-zero-devel                                 1.20.2                                  amd64        oneAPI Level Zero
-----------------------------------------------------------------
igpu not detected
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Iris(R) Xe Graphics 12.3.0 [1.6.32224.500000]
[opencl:cpu][opencl:0] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-1370P OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO  [24.52.32224.5]
-----------------------------------------------------------------
xpu-smi is not installed. Please install xpu-smi according to README.md

Additional context
Note that runner fails because -ngl is passed when the parameter allowed is -n-gpu-layers.

@sgwhat
Copy link
Contributor

sgwhat commented Apr 28, 2025

Hi @aarononeal, which version of ipex-llm ollama are you running?

@kirel
Copy link

kirel commented Apr 28, 2025

I see the same when I build the latest xpu-ccp container according to the guide.

@aarononeal
Copy link
Author

@sgwhat I followed the Docker quickstart using image intelanalytics/ipex-llm-inference-cpp-xpu:latest which for some reason is reporting Ollama version 0.0.0.

@kirel
Copy link

kirel commented Apr 28, 2025

Can also confirm that.

@Apokalypzx
Copy link

Apokalypzx commented Apr 29, 2025

Confirmed

/llm/ollama/./ollama --version
version=0.0.0 

Gemma3:4b (Q4_0 and QA along with Gemma3:1b) result in error about "-ngl provided but not defined."

Also tested models Mistral:7b, llama3.1:8b. Both load and inference on the GPU fine without error.

Environment: Docker container built and run exactly as instructed in ipex-llm/docker/llm/inference-cpp. I exported ONEAPI_DEVICE_SELECTOR=level_zero:0 (my GPU) along with OLLAMA_HOST=0.0.0.0 prior to running /llm/scripts/start-ollama.sh. executing /llm/ollama/ollama downloads, loads, and inferences mistral:7b perfectly fine and is visible to open-webui.

All attempts at running Gemma3 fail with same error as OP. I even downloaded the raw .gguf from HF, made my own Modelfile and created the model that way, same error when running the model.

@stormsteve
Copy link

Happens for me too.

I just installed ollama-ipex-llm-2.3.0b20250428-win.zip

C:>ollama run gemma3
Error: llama runner process has terminated: exit status 2

The log shows:
[...]
time=2025-04-29T21:05:57.626+03:00 level=INFO source=server.go:426 msg="starting llama server" cmd="C:\Users\steve\ollama\ollama-lib.exe runner --ollama-engine --model C:\Users\steve\.ollama\models\blobs\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 16384 --batch-size 512 -ngl 999 --threads 6 --no-mmap --parallel 4 --port 61002"
time=2025-04-29T21:05:57.640+03:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-04-29T21:05:57.644+03:00 level=INFO source=server.go:601 msg="waiting for llama runner to start responding"
time=2025-04-29T21:05:57.645+03:00 level=INFO source=server.go:635 msg="waiting for server to become available" status="llm server error"
flag provided but not defined: -ngl
Runner usage
[...]

But gemma2 works without an issue.

@nebulakid
Copy link

nebulakid commented Apr 29, 2025

Same for me, but now for all models. gemma3 starts with:

cmd="C:\\temp\\ollama_nightly\\ollama-lib.exe runner --ollama-engine --model W:\\llm\\blobs\\sha256-3d3a5470ffe8a3b9d5dff58f3be2c87c184b53f0b7fe25887328d17e8fd32eb2 --ctx-size 8192 --batch-size 512 -ngl 999 --threads 4 --no-mmap --parallel 4 --port 32671"

qwen3:

cmd="C:\\temp\\ollama_nightly\\ollama-lib.exe runner --model W:\\llm\\blobs\\sha256-a3de86cd1c132c822487ededd47a324c50491393e6565cd14bafa40d0b8e686f --ctx-size 8192 --batch-size 512 -ngl 999 --threads 4 --no-mmap --parallel 4 --port 32695"

the difference is --ollama-engine

Version: ollama-ipex-llm-2.3.0b20250428-win.zip

@aarononeal
Copy link
Author

Ah. Maybe just not supported yet? #12963

@sgwhat
Copy link
Contributor

sgwhat commented Apr 30, 2025

Hi guys, this version and -ngl issues have been fixed today, you may install the latest version via pip install --pre --upgrade ipex-llm[cpp] tmr.

@aarononeal
Copy link
Author

aarononeal commented Apr 30, 2025

@sgwhat, thank you for your efforts here!

I just tried updating and I still get the -ngl error, but I noticed the package installed was 2.3.0b20250428. Is that the correct version with the fix?

Ah, nevermind, you said tomorrow! I will check back then. Thanks again!

@aarononeal
Copy link
Author

So I'm happy to report that the -ngl issue appears to be resolved. However, I'm still not seeing the model load.

These repeat in the logs for a while:

time=2025-05-02T06:39:57.571+08:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T06:39:59.454+08:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server not responding"

Before a final timeout:

time=2025-05-02T06:40:13.912+08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="timed out waiting for llama runner to start - progress 0.00 - "

Other entries related to runner and model load:

time=2025-05-02T06:35:08.589+08:00 level=INFO source=server.go:430 msg="starting llama server" cmd="/usr/local/lib/python3.11/dist-packages/bigdl/cpp/libs/ollama/ollama-lib runner --ollama-engine --model /root/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 2048 --batch-size 512 --n-gpu-layers 999 --threads 18 --no-mmap --parallel 1 --port 40013"
time=2025-05-02T06:35:08.590+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T06:35:08.591+08:00 level=INFO source=server.go:605 msg="waiting for llama runner to start responding"
time=2025-05-02T06:35:08.592+08:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T06:35:08.775+08:00 level=INFO source=runner.go:757 msg="starting ollama engine"
time=2025-05-02T06:35:08.779+08:00 level=INFO source=runner.go:817 msg="Server listening on 127.0.0.1:40013"
time=2025-05-02T06:35:08.846+08:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T06:35:08.863+08:00 level=INFO source=ggml.go:68 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
time=2025-05-02T06:35:09.920+08:00 level=INFO source=ggml.go:296 msg="Number of model weight buffers" count=2
time=2025-05-02T06:35:09.920+08:00 level=INFO source=ggml.go:299 msg="model weights" buffer=CPU size="525.0 MiB"
time=2025-05-02T06:35:09.920+08:00 level=INFO source=ggml.go:299 msg="model weights" buffer=SYCL0 size="3.1 GiB"

@aarononeal
Copy link
Author

aarononeal commented May 1, 2025

A system restart resolved the above timeouts and the model appears to load. However, now stopped at this issue.

time=2025-05-02T07:18:23.065+08:00 level=INFO source=server.go:644 msg="llama runner started in 5.72 seconds"
panic: failed to sample token: no tokens to sample from

goroutine 11 [running]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000132120, {0x1569620, 0xc000691540})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:335 +0x65
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:794 +0xa9c

@nebulakid
Copy link

nebulakid commented May 2, 2025

@sgwhat: do you know how often the windows nightly builds of ollama are being build?

@jason-dai
Copy link
Contributor

@sgwhat: do you know how often the windows nightly builds of ollama are being build?

For the latest nightly, please refer to https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md

@aarononeal
Copy link
Author

Although I still can’t get gemma3 working due to the token problem, the specific issue I reported is addressed, so closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants