Skip to content

Commit afbb4c1

Browse files
ggml-cuda: Adding support for unified memory (#8035)
* Adding support for unified memory * adding again the documentation about unified memory * refactoring: Moved the unified memory code in the correct location. * Fixed compilation error when using hipblas * cleaning up the documentation * Updating the documentation Co-authored-by: Johannes Gäßler <[email protected]> * adding one more case where the PR should not be enabled --------- Co-authored-by: matteo serva <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]>
1 parent b7a08fd commit afbb4c1

File tree

2 files changed

+20
-1
lines changed

2 files changed

+20
-1
lines changed

docs/build.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,11 @@ For Jetson user, if you have Jetson Orin, you can try this: [Offical Support](ht
178178
cmake --build build --config Release
179179
```
180180
181-
The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used. The following compilation options are also available to tweak performance:
181+
The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used.
182+
183+
The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted. In Windows this setting is available in the NVIDIA control panel as `System Memory Fallback`.
184+
185+
The following compilation options are also available to tweak performance:
182186
183187
| Option | Legal values | Default | Description |
184188
|-------------------------------|------------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

ggml/src/ggml-cuda.cu

+15
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,22 @@ static cudaError_t ggml_cuda_device_malloc(void ** ptr, size_t size, int device)
130130
}
131131
return res;
132132
#else
133+
134+
#if !defined(GGML_USE_HIPBLAS) && !defined(GGML_USE_MUSA)
135+
cudaError_t err;
136+
if (getenv("GGML_CUDA_ENABLE_UNIFIED_MEMORY") != nullptr)
137+
{
138+
err = cudaMallocManaged(ptr, size);
139+
}
140+
else
141+
{
142+
err = cudaMalloc(ptr, size);
143+
}
144+
return err;
145+
#else
133146
return cudaMalloc(ptr, size);
147+
#endif // !defined(GGML_USE_HIPBLAS) && !defined(GGML_USE_MUSA)
148+
134149
#endif
135150
}
136151

0 commit comments

Comments
 (0)