need help on llama.cpp in docker container with arm64 (raspberry pi 5)+ vulkan + amdgpu #12639

zphilip · 2025-03-29T03:17:16Z

zphilip
Mar 29, 2025

Hi
My arm64 (raspberry pi 5)+ vulkan + amdgpu (Rx 6700xt) work fine in physical machine with raspberry pi OS (6.6.y).. the problem is the docker container based on that. I create docker container to run llama.cpp : debian bookworm (arm64)+ vulkan + amdgpu (Rx 6700xt)... with map /dev/dri/ , but it failed with following output , what's the "Bus error (core dumped)" mean?

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6700 XT (RADV NAVI22) (radv) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 65536 | matrix cores: none
build: 4984 (5d01670) with cc (Debian 12.2.0-14) 12.2.0 for aarch64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon RX 6700 XT (RADV NAVI22)) - 12032 MiB free
llama_model_loader: loaded meta data with 35 key-value pairs and 255 tensors from models/Llama-3.2-3B-Instruct-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
...........
load_tensors: loading model tensors, this can take a while... (mmap = true)
make_cpu_buft_list: disabling extra buffer types (i.e. repacking) since a GPU device is available
load_tensors: offloading 28 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 29/29 layers to GPU
load_tensors: Vulkan0 model buffer size = 1918.35 MiB
load_tensors: CPU_Mapped model buffer size = 308.23 MiB
...........................................................................
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: freq_base = 500000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
Bus error (core dumped)

vulkaninfo show:

VULKANINFO

Vulkan Instance Version: 1.3.239

Instance Extensions: count = 20

VK_EXT_acquire_drm_display : extension revision 1
VK_EXT_acquire_xlib_display : extension revision 1
VK_EXT_debug_report : extension revision 10
VK_EXT_debug_utils : extension revision 2
VK_EXT_direct_mode_display : extension revision 1
VK_EXT_display_surface_counter : extension revision 1
VK_KHR_device_group_creation : extension revision 1
VK_KHR_display : extension revision 23
VK_KHR_external_fence_capabilities : extension revision 1
VK_KHR_external_memory_capabilities : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_display_properties2 : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2 : extension revision 1
VK_KHR_portability_enumeration : extension revision 1
VK_KHR_surface : extension revision 25
VK_KHR_surface_protected_capabilities : extension revision 1
VK_KHR_wayland_surface : extension revision 6
VK_KHR_xcb_surface : extension revision 6
VK_KHR_xlib_surface : extension revision 6

Instance Layers: count = 3

VK_LAYER_KHRONOS_validation Khronos Validation Layer 1.3.239 version 1
VK_LAYER_MESA_device_select Linux device selection layer 1.3.211 version 1
VK_LAYER_MESA_overlay Mesa Overlay layer 1.3.211 version 1

Devices:

GPU0:
apiVersion = 1.3.230
driverVersion = 22.3.6
vendorID = 0x1002
deviceID = 0x73df
deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
deviceName = AMD Radeon RX 6700 XT (RADV NAVI22)
driverID = DRIVER_ID_MESA_RADV
driverName = radv
driverInfo = Mesa 22.3.6
conformanceVersion = 1.3.0.0
deviceUUID = 00000000-0300-0000-0000-000000000000
driverUUID = 414d442d-4d45-5341-2d44-525600000000

the gpu usage monitor show a peak

houckham · 2025-04-26T04:01:07Z

houckham
Apr 26, 2025

I wish to build the same and now found your post. I just got it running today natively on RPi 5 with Deb 12 6.6-y-gpu with recompiled and patched kernel. PCIe Oculink to an AMD GPU. It runs great. Now I would like to containerise the build for docker and eventually K8s.

https://www.jeffgeerling.com/blog/2024/llms-accelerated-egpu-on-raspberry-pi-5
https://gist.github.com/mgarratt/afb3b57a08e2eb2479eb6083a86d8a64

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need help on llama.cpp in docker container with arm64 (raspberry pi 5)+ vulkan + amdgpu #12639

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

need help on llama.cpp in docker container with arm64 (raspberry pi 5)+ vulkan + amdgpu #12639

zphilip Mar 29, 2025

vulkaninfo show:

VULKANINFO

Instance Extensions: count = 20

Instance Layers: count = 3

Devices:

Replies: 1 comment

houckham Apr 26, 2025

zphilip
Mar 29, 2025

houckham
Apr 26, 2025