Skip to content

ollama runs into segmentation faults in gvisor + nvproxy #11098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
markusthoemmes opened this issue Oct 31, 2024 · 3 comments · Fixed by #11102
Closed

ollama runs into segmentation faults in gvisor + nvproxy #11098

markusthoemmes opened this issue Oct 31, 2024 · 3 comments · Fixed by #11102
Labels
area: gpu Issue related to sandboxed GPU access type: bug Something isn't working

Comments

@markusthoemmes
Copy link

Description

As per title, really. Running a pretty basic ollama pod and then trying to run a model within that results in a segmentation fault on starting the server.

I've tried to hunt for the reason and the strace output seems to contain hints of it running out of memory

D1031 13:54:34.084222       1 task_run.go:313] [ 108( 107): 108( 107)] Unhandled user fault: addr=7f5c87ffd9a0 ip=7f5f2d790148 access=r-- sig=11 err=stub syscall (9, []arch.SyscallArgument{arch.SyscallArgument{Value:0x7f5c87ffa000}, arch.SyscallArgument{Value:0x4000}, arch.SyscallArgument{Value:0x1}, arch.SyscallArgument{Value:0x11}, arch.SyscallArgument{Value:0x32}, arch.SyscallArgument{Value:0x5df24000}}) failed with cannot allocate memory

However, the process should have plenty of memory available, so I seem to be missing something.

Steps to reproduce

I'm testing this in a Kubernetes pod, like so

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      runtimeClassName: nvidia
      volumes:
      containers:
        - name: cuda-container
          image: ollama/ollama
          ports:
            - containerPort: 11434
          resources:
            limits:
              cpu: "10"
              memory: 20G
            requests:
              cpu: "2"
              memory: 6G
          env:
          - name: PATH
            value: /usr/local/nvidia/bin:/bin/nvidia/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
          - name: LD_LIBRARY_PATH
            value: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
          - name: NVIDIA_DRIVER_CAPABILITIES
            value: compute,utility
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule

I then exec into the pod an do ollama run granite3-dense (for example, the model doesn't seem to matter, I tried a few) and get

Error: llama runner process has terminated: signal: segmentation fault

The gvisor setup is as per https://gvisor.dev/docs/user_guide/containerd/configuration/.

runsc version

$ runsc --version
runsc version release-20241021.0
spec: 1.1.0-rc.1

docker version (if using docker)

No response

uname

Linux pool-2i3vypv5x-g3267 6.1.0-25-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux

kubectl (if using Kubernetes)

$ kubectl version
Client Version: v1.31.2
Kustomize Version: v5.4.2
Server Version: v1.31.1

repo state (if built from source)

No response

runsc debug logs (if available)

debug-strace.log.zip

@markusthoemmes markusthoemmes added the type: bug Something isn't working label Oct 31, 2024
@ayushr2
Copy link
Collaborator

ayushr2 commented Oct 31, 2024

Try setting /proc/sys/vm/max_map_count to 1000000. If that fixes things, then this is caused by a known issue. runsc-sandbox exhausts host VMAs while dealing with such large model files in memory.

@ayushr2 ayushr2 added the area: gpu Issue related to sandboxed GPU access label Oct 31, 2024
@ayushr2
Copy link
Collaborator

ayushr2 commented Oct 31, 2024

You can also pass --host-settings=enforce flag to runsc, which was added for this reason in 8e60158.

@markusthoemmes
Copy link
Author

Yep, that worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: gpu Issue related to sandboxed GPU access type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants