Skip to content

Commit dd64ba5

Browse files
committed
[SYCL][DOC] CUDA and HIP GetStartedGuide updates
* Fix links to configure.py and compile.py * Linking to the file in tree caused the link in the docs to just download the python script. It makes more sense to link to the github web UI for these as they should be used in a checkout anyway. * Remove CUDA requiring device selector * Since intel#6203 the device selector should handle this case well. * Update HIP backend limitations * HIP is no longer in beta * Windows isn't supported but intel#17702 made the build work with it so it might work for some users. * Global offset has been supported since intel#5855 * Add common limitations * Update HIP for Nvidia section * This might work but is not supported * Update HIP section * Recommended HIP version + testing platforms * HIP is no longer in beta * Add note on target aliases for CUDA and HIP
1 parent cb5ef36 commit dd64ba5

File tree

1 file changed

+36
-31
lines changed

1 file changed

+36
-31
lines changed

sycl/doc/GetStartedGuide.md

Lines changed: 36 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,8 @@ git clone --config core.autocrlf=false https://github.com/intel/llvm -b sycl
9090
## Build DPC++ toolchain
9191

9292
The easiest way to get started is to use the buildbot
93-
[configure](../../buildbot/configure.py) and
94-
[compile](../../buildbot/compile.py) scripts.
93+
[configure](https://github.com/intel/llvm/blob/sycl/buildbot/configure.py) and
94+
[compile](https://github.com/intel/llvm/blob/sycl/buildbot/compile.py) scripts.
9595

9696
In case you want to configure CMake manually the up-to-date reference for
9797
variables is in these files.
@@ -233,21 +233,21 @@ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$DPCPP_HOME/llvm/build/lib ./a.out
233233

234234
### Build DPC++ toolchain with support for HIP AMD
235235

236-
There is beta support for oneAPI DPC++ for HIP on AMD devices. It is not feature
237-
complete and it still contains known and unknown bugs. Currently it has only
238-
been tried on Linux, with ROCm 4.2.0, 4.3.0, 4.5.2, 5.3.0, and 5.4.3, using the
239-
AMD Radeon Pro W6800 (gtx1030), MI50 (gfx906), MI100 (gfx908) and MI250x
240-
(gfx90a) devices. The backend is tested by a relevant device/toolkit prior to a
241-
oneAPI plugin release. Go to the plugin release
242-
[pages](https://developer.codeplay.com/products/oneapi/amd) for further details.
243-
244236
To enable support for HIP devices, follow the instructions for the Linux DPC++
245237
toolchain, but add the `--hip` flag to `configure.py`.
246238

247239
Enabling this flag requires an installation of ROCm on the system, for
248240
instruction on how to install this refer to
249241
[AMD ROCm Installation Guide for Linux](https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html).
250242

243+
ROCm versions above 5.7 are recommended as earlier versions don't have graph
244+
support. DPC++ aims to support new ROCm versions as they come out, so there may
245+
be a delay but generally the latest ROCm version should work. The ROCm support
246+
is mostly tested on AMD Radeon Pro W6800 (gfx1030), and MI250x (gfx90a), however
247+
other architectures supported by LLVM may work just fine. The full list of ROCm
248+
versions tested prior to oneAPI releases are listed on the plugin release
249+
[pages](https://developer.codeplay.com/products/oneapi/amd).
250+
251251
The DPC++ build assumes that ROCm is installed in `/opt/rocm`, if it is
252252
installed somewhere else, the directory must be provided through the CMake
253253
variable `UR_HIP_ROCM_DIR` which can be passed through to cmake using the
@@ -276,7 +276,10 @@ by default when configuring for HIP. For more details on building LLD refer to
276276

277277
### Build DPC++ toolchain with support for HIP NVIDIA
278278

279-
There is experimental support for oneAPI DPC++ for HIP on Nvidia devices.
279+
HIP applications can be built to target Nvidia GPUs, so in theory it is possible
280+
to build the DPC++ HIP support for Nvidia, however this is not supported, so it
281+
may not work.
282+
280283
There is no continuous integration for this and there are no guarantees for
281284
supported platforms or configurations.
282285

@@ -288,13 +291,12 @@ To enable support for HIP NVIDIA devices, follow the instructions for the Linux
288291
DPC++ toolchain, but add the `--hip` and `--hip-platform NVIDIA` flags to
289292
`configure.py`.
290293

291-
Enabling this flag requires HIP to be installed, more specifically
292-
[HIP NVCC](https://rocmdocs.amd.com/en/latest/Installation_Guide/HIP-Installation.html#nvidia-platform),
293-
as well as the CUDA Runtime API to be installed, see
294-
[NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html).
295-
296-
Currently, this has only been tried on Linux, with ROCm 4.2.0 or 4.3.0, with
297-
CUDA 11, and using a GeForce 1060 device.
294+
Enabling this flag requires HIP to be installed, specifically for Nvidia, see
295+
the Nvidia tab on the HIP installation docs
296+
[here](https://rocm.docs.amd.com/projects/HIP/en/latest/install/install.html),
297+
as well as the CUDA Runtime API to be installed, see [NVIDIA CUDA Installation
298+
Guide for
299+
Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html).
298300

299301
### Build DPC++ toolchain with support for ARM processors
300302

@@ -705,14 +707,6 @@ clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda \
705707
The results are correct!
706708
```
707709
708-
**NOTE**: Currently, when the application has been built with the CUDA target,
709-
the CUDA backend must be selected at runtime using the `ONEAPI_DEVICE_SELECTOR`
710-
environment variable.
711-
712-
```bash
713-
ONEAPI_DEVICE_SELECTOR=cuda:* ./simple-sycl-app-cuda.exe
714-
```
715-
716710
**NOTE**: oneAPI DPC++/SYCL developers can specify SYCL device for execution
717711
using device selectors (e.g. `sycl::cpu_selector_v`, `sycl::gpu_selector_v`,
718712
[Intel FPGA selector(s)](extensions/supported/sycl_ext_intel_fpga_device_selector.asciidoc))
@@ -746,6 +740,14 @@ clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda \
746740
-Xsycl-target-backend --cuda-gpu-arch=sm_80
747741
```
748742
743+
Additionally AMD and Nvidia targets also support aliases for the target to
744+
simplify passing the specific architectures, for example
745+
`-fsycl-targets=nvidia_gpu_sm_80` is equivalent to
746+
`-fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend
747+
--cuda-gpu-arch=sm_80`, the full list of available aliases is documented in the
748+
[Users Manual](UsersManual.md#generic-options), for the `-fsycl-targets`
749+
option.
750+
749751
To build simple-sycl-app ahead of time for GPU, CPU or Accelerator devices,
750752
specify the target architecture. The examples provided use a supported
751753
alias for the target, representing a full triple. Additional details can
@@ -914,11 +916,14 @@ int CUDASelector(const sycl::device &Device) {
914916
915917
### HIP back-end limitations
916918
917-
* Requires a ROCm compatible operating system, for full details of supported
918-
Operating System for ROCm, please refer to the
919-
[ROCm Supported Operating Systems](https://github.com/RadeonOpenCompute/ROCm#supported-operating-systems).
920-
* Support is still in a beta state, but the backend is being actively developed.
921-
* Global offsets are currently not supported.
919+
* Requires a ROCm compatible system and GPU, see for
920+
[Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-skus)
921+
and for
922+
[Windows](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html#supported-skus).
923+
* Windows for HIP is not supported by DPC++ at the moment so it may not work.
924+
* `printf` within kernels is not supported.
925+
* C++ standard library functions using complex types are not supported,
926+
`sycl::complex` should be used instead.
922927
923928
## Find More
924929

0 commit comments

Comments
 (0)