ollama runs into segmentation faults in gvisor + nvproxy #11098

markusthoemmes · 2024-10-31T15:50:49Z

Description

As per title, really. Running a pretty basic ollama pod and then trying to run a model within that results in a segmentation fault on starting the server.

I've tried to hunt for the reason and the strace output seems to contain hints of it running out of memory

D1031 13:54:34.084222       1 task_run.go:313] [ 108( 107): 108( 107)] Unhandled user fault: addr=7f5c87ffd9a0 ip=7f5f2d790148 access=r-- sig=11 err=stub syscall (9, []arch.SyscallArgument{arch.SyscallArgument{Value:0x7f5c87ffa000}, arch.SyscallArgument{Value:0x4000}, arch.SyscallArgument{Value:0x1}, arch.SyscallArgument{Value:0x11}, arch.SyscallArgument{Value:0x32}, arch.SyscallArgument{Value:0x5df24000}}) failed with cannot allocate memory

However, the process should have plenty of memory available, so I seem to be missing something.

Steps to reproduce

I'm testing this in a Kubernetes pod, like so

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      runtimeClassName: nvidia
      volumes:
      containers:
        - name: cuda-container
          image: ollama/ollama
          ports:
            - containerPort: 11434
          resources:
            limits:
              cpu: "10"
              memory: 20G
            requests:
              cpu: "2"
              memory: 6G
          env:
          - name: PATH
            value: /usr/local/nvidia/bin:/bin/nvidia/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
          - name: LD_LIBRARY_PATH
            value: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
          - name: NVIDIA_DRIVER_CAPABILITIES
            value: compute,utility
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule

I then exec into the pod an do ollama run granite3-dense (for example, the model doesn't seem to matter, I tried a few) and get

Error: llama runner process has terminated: signal: segmentation fault

The gvisor setup is as per https://gvisor.dev/docs/user_guide/containerd/configuration/.

runsc version

$ runsc --version
runsc version release-20241021.0
spec: 1.1.0-rc.1

docker version (if using docker)

No response

uname

Linux pool-2i3vypv5x-g3267 6.1.0-25-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux

kubectl (if using Kubernetes)

$ kubectl version
Client Version: v1.31.2
Kustomize Version: v5.4.2
Server Version: v1.31.1

repo state (if built from source)

No response

runsc debug logs (if available)

debug-strace.log.zip

The text was updated successfully, but these errors were encountered:

ayushr2 · 2024-10-31T17:11:46Z

Try setting /proc/sys/vm/max_map_count to 1000000. If that fixes things, then this is caused by a known issue. runsc-sandbox exhausts host VMAs while dealing with such large model files in memory.

ayushr2 · 2024-10-31T17:13:28Z

You can also pass --host-settings=enforce flag to runsc, which was added for this reason in 8e60158.

markusthoemmes · 2024-11-01T10:01:44Z

Yep, that worked!

Fixes #11098 PiperOrigin-RevId: 692217741

markusthoemmes added the type: bug Something isn't working label Oct 31, 2024

ayushr2 added the area: gpu Issue related to sandboxed GPU access label Oct 31, 2024

copybara-service bot pushed a commit that referenced this issue Nov 1, 2024

Add a section on host configurations to the gVisor GPU documentation.

bd3ff5b

Fixes #11098 PiperOrigin-RevId: 692217741

copybara-service bot mentioned this issue Nov 1, 2024

Add a section on host configurations to the gVisor GPU documentation. #11102

Merged

copybara-service bot pushed a commit that referenced this issue Nov 1, 2024

Add a section on host configurations to the gVisor GPU documentation.

0e8ddee

Fixes #11098 PiperOrigin-RevId: 692217741

copybara-service bot closed this as completed in e514af1 Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ollama runs into segmentation faults in gvisor + nvproxy #11098

ollama runs into segmentation faults in gvisor + nvproxy #11098

markusthoemmes commented Oct 31, 2024

ayushr2 commented Oct 31, 2024

Uh oh!

ayushr2 commented Oct 31, 2024

Uh oh!

markusthoemmes commented Nov 1, 2024

Uh oh!

ollama runs into segmentation faults in gvisor + nvproxy #11098

ollama runs into segmentation faults in gvisor + nvproxy #11098

Comments

markusthoemmes commented Oct 31, 2024

Description

Steps to reproduce

runsc version

docker version (if using docker)

uname

kubectl (if using Kubernetes)

repo state (if built from source)

runsc debug logs (if available)

ayushr2 commented Oct 31, 2024

Uh oh!

ayushr2 commented Oct 31, 2024

Uh oh!

markusthoemmes commented Nov 1, 2024

Uh oh!