You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sycl/doc/EnvironmentVariables.md
+12-12
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ compiler and runtime.
22
22
|`SYCL_ENABLE_DEFAULT_CONTEXTS`| '1' or '0' | Enable ('1') or disable ('0') creation of default platform contexts in SYCL runtime. The default context for each platform contains all devices in the platform. Refer to [Platform Default Contexts](extensions/supported/sycl_ext_oneapi_default_context.asciidoc) extension to learn more. Enabled by default on Linux and disabled on Windows. |
23
23
|`SYCL_RT_WARNING_LEVEL`| Positive integer | The higher warning level is used the more warnings and performance hints the runtime library may print. Default value is '0', which means no warning/hint messages from the runtime library are allowed. The value '1' enables performance warnings from device runtime/codegen. The values greater than 1 are reserved for future use. |
24
24
|`SYCL_USM_HOSTPTR_IMPORT`| Integer | Enable by specifying non-zero value. Buffers created with a host pointer will result in host data promotion to USM, improving data transfer performance. To use this feature, also set SYCL_HOST_UNIFIED_MEMORY=1. |
25
-
|`SYCL_EAGER_INIT`| Integer | Enable by specifying non-zero value. Tells the SYCL runtime to do as much as possible initialization at objects construction as opposed to doing lazy initialization on the fly. This may mean doing some redundant work at warmup but ensures fastest possible execution on the following hot and reportable paths. It also instructs PI plugins to do the same. Default is "0". |
25
+
|`SYCL_EAGER_INIT`| Integer | Enable by specifying non-zero value. Tells the SYCL runtime to do as much as possible initialization at objects construction as opposed to doing lazy initialization on the fly. This may mean doing some redundant work at warmup but ensures fastest possible execution on the following hot and reportable paths. It also instructs UR adapters to do the same. Default is "0". |
26
26
|`SYCL_REDUCTION_PREFERRED_WORKGROUP_SIZE`| See [below](#sycl_reduction_preferred_workgroup_size)| Controls the preferred work-group size of reductions. |
27
27
|`SYCL_ENABLE_FUSION_CACHING`| '1' or '0' | Enable ('1') or disable ('0') caching of JIT compilations for kernel fusion. Caching avoids repeatedly running the JIT compilation pipeline if the same sequence of kernels is fused multiple times. Default value is '1'. |
28
28
|`SYCL_JIT_AMDGCN_PTX_KERNELS`| '1' or '0' | Enable ('1') or disable ('0') JIT compilation of kernels. Only supported for Nvidia and AMD backends. Note, that it is required to have a valid binary for the desired backend (AMD or CUDA), that was compiled with `-fsycl-embed-ir` in order to use JIT-ing. When JIT-ing is enabled SYCL runtime will try to cache and reuse JIT-compiled kernels, furthermore if a kernel uses specialization constants the compiler will attempt to materialize the values in place, turning them to de-facto compile time constants. Default is '0'. |
@@ -152,23 +152,23 @@ For a description of parallel for range rounding in DPC++ see
152
152
|||`MinRangeX`: The minimum X dimension of the range such that range rounding is activated (Default 1024) |
153
153
154
154
155
-
## Controlling DPC++ Level Zero Plugin
155
+
## Controlling DPC++ Level Zero Adapter
156
156
157
157
| Environment variable | Values | Description |
158
158
| -------------------- | ------ | ----------- |
159
159
|`SYCL_ENABLE_PCI` (Deprecated) | Integer | When set to 1, enables obtaining the GPU PCI address when using the Level Zero backend. The default is 1. This option is kept for compatibility reasons and is immediately deprecated. |
160
-
|`SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR`| Any(\*) | Disable USM allocator in Level Zero plugin (each memory request will go directly to Level Zero runtime) |
161
-
|`SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY`| Any(\*) | Enable support of the kernels with indirect access and corresponding deferred release of memory allocations in the Level Zero plugin. |
160
+
|`SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR`| Any(\*) | Disable USM allocator in Level Zero adapter (each memory request will go directly to Level Zero runtime) |
161
+
|`SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY`| Any(\*) | Enable support of the kernels with indirect access and corresponding deferred release of memory allocations in the Level Zero adapter. |
162
162
163
163
`(*) Note: Any means this environment variable is effective when set to any non-null value.`
164
164
165
-
## Controlling DPC++ CUDA Plugin
165
+
## Controlling DPC++ CUDA Adapter
166
166
167
167
| Environment variable | Values | Description |
168
168
| -------------------- | ------ | ----------- |
169
169
|`SYCL_PI_CUDA_MAX_LOCAL_MEM_SIZE`| Integer | Specifies the maximum size of a local memory allocation in bytes. If the value exceeds the device's capabilities then a `sycl::runtime_error` is thrown. In order for the full error message to be printed, `SYCL_RT_WARNING_LEVEL=2` must be set. The default value for `SYCL_PI_CUDA_MAX_LOCAL_MEM_SIZE` is determined by the hardware. |
170
170
171
-
## Controlling DPC++ HIP Plugin
171
+
## Controlling DPC++ HIP Adapter
172
172
173
173
| Environment variable | Values | Description |
174
174
| -------------------- | ------ | ----------- |
@@ -231,7 +231,7 @@ variables in production code.</span>
231
231
| after_addHostAcc | print graph after addHostAccessor method |
232
232
| always | print graph before and after each of the above methods |
described below are used for development and debugging of DPC++ compiler
@@ -240,15 +240,15 @@ variables in production code.</span>
240
240
241
241
| Environment variable | Values | Description |
242
242
| -------------------- | ------ | ----------- |
243
-
|`SYCL_PI_LEVEL_ZERO_SINGLE_THREAD_MODE`| Integer | A single-threaded app has an opportunity to enable this mode to avoid overhead from mutex locking in the Level Zero plugin. A value greater than 0 enables single thread mode. A value of 0 disables single thread mode. The default is 0. |
243
+
|`SYCL_PI_LEVEL_ZERO_SINGLE_THREAD_MODE`| Integer | A single-threaded app has an opportunity to enable this mode to avoid overhead from mutex locking in the Level Zero adapter. A value greater than 0 enables single thread mode. A value of 0 disables single thread mode. The default is 0. |
244
244
| `SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR` | [EnableBuffers][;[MaxPoolSize][;[host\|device\|shared:][MaxPoolableSize][,[Capacity][,SlabMinSize]]]...] | EnableBuffers enables pooling for SYCL buffers, default 1, set to 0 to disable. MaxPoolSize is the maximum size of the pool, by default there is no size limit. MemType is host, device, shared or read_only_shared. Other parameters are values specified as positive integers with optional K, M or G suffix. MaxPoolableSize is the maximum allocation size that may be pooled, default 0 for shared, 2MB for host, 4MB for device and read_only_shared. Capacity is the number of allocations in each size range freed by the program but retained in the pool for reallocation, default 4. Size ranges follow this pattern: 64, 96, 128, 192, and so on, i.e., powers of 2, with one range in between. SlabMinSize is the minimum allocation size, 64KB for host and device, 2MB for shared and read_only_shared. Example: SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=1;32M;host:1M,4,64K;device:1M,4,64K;shared:0,0,2M|
245
245
|`SYCL_PI_LEVEL_ZERO_BATCH_SIZE`| Integer | Sets a preferred number of compute commands to batch into a command list before executing the command list. A value of 0 causes the batch size to be adjusted dynamically. A value greater than 0 specifies fixed size batching, with the batch size set to the specified value. The default is 0. |
246
246
|`SYCL_PI_LEVEL_ZERO_COPY_BATCH_SIZE`| Integer | Sets a preferred number of copy commands to batch into a command list before executing the command list. A value of 0 causes the batch size to be adjusted dynamically. A value greater than 0 specifies fixed size batching, with the batch size set to the specified value. The default is 0. |
247
247
|`SYCL_PI_LEVEL_ZERO_FILTER_EVENT_WAIT_LIST`| Integer | When set to 0, disables filtering of signaled events from wait lists when using the Level Zero backend. The default is 0. |
248
-
|`SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE`| Any(\*) | This environment variable enables users to control use of copy engines for copy operations. If the value is an integer, it will allow the use of copy engines, if available in the device, in Level Zero plugin to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The value of this environment variable can also be a pair of the form "lower_index:upper_index" where the indices point to copy engines in a list of all available copy engines. The default is 0:0 when immediate command lists are being used on the device and 1 otherwise. (Also see description of SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS). |
248
+
|`SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE`| Any(\*) | This environment variable enables users to control use of copy engines for copy operations. If the value is an integer, it will allow the use of copy engines, if available in the device, in Level Zero adapter to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The value of this environment variable can also be a pair of the form "lower_index:upper_index" where the indices point to copy engines in a list of all available copy engines. The default is 0:0 when immediate command lists are being used on the device and 1 otherwise. (Also see description of SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS). |
249
249
|`SYCL_PI_LEVEL_ZERO_USE_COMPUTE_ENGINE`| Integer | It can be set to an integer (>=0) in which case all compute commands will be submitted to the command-queue with the given index in the compute command group. If it is instead set to a negative value then all available compute engines may be used. The default value is "0" |
250
-
|`SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY` (experimental) | Integer | Allows the use of copy engine, if available in the device, in Level Zero plugin for device to device copy operations. The default is 0. This option is experimental and will be removed once heuristics are added to make a decision about use of copy engine for device to device copy operations. |
251
-
|`SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS`| Any(\*) | Enable support of device-scope events whose state is not visible to the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=1 the Level Zero plugin would create all events having device-scope only and create proxy host-visible events for them when their status is needed (wait/query) on the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 the Level Zero plugin would create all events having device-scope and add proxy host-visible event at the end of each command-list submission. The default is 0, meaning all events have host visibility. SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS is ignored when using immediate command lists (SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1) and all events use default scope of 0. |
250
+
|`SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY` (experimental) | Integer | Allows the use of copy engine, if available in the device, in Level Zero adapter for device to device copy operations. The default is 0. This option is experimental and will be removed once heuristics are added to make a decision about use of copy engine for device to device copy operations. |
251
+
|`SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS`| Any(\*) | Enable support of device-scope events whose state is not visible to the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=1 the Level Zero adapter would create all events having device-scope only and create proxy host-visible events for them when their status is needed (wait/query) on the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 the Level Zero adapter would create all events having device-scope and add proxy host-visible event at the end of each command-list submission. The default is 0, meaning all events have host visibility. SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS is ignored when using immediate command lists (SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1) and all events use default scope of 0. |
252
252
|`SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS`| Integer | When set to a positive value enables use of Level Zero immediate commandlists, which means there is no batching and all commands are immediately submitted for execution. When set to 1, unique immediate commandlists are created for each SYCL queue. When set to 2, unique immediate commandlists are created per host thread per SYCL queue. Default is 1 on Intel® Data Center GPU Max Series running Linux and 0 elsewhere. |
253
253
|`SYCL_PI_LEVEL_ZERO_USE_MULTIPLE_COMMANDLIST_BARRIERS`| Integer | When set to a positive value enables use of multiple Level Zero commandlists when submitting barriers. Default is 1. |
254
254
|`SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_FILL`| Integer | When set to a positive value enables use of a copy engine for memory fill operations. Default is 0. |
@@ -260,7 +260,7 @@ variables in production code.</span>
260
260
|`SYCL_PI_LEVEL_ZERO_USM_RESIDENT`| Integer | Bit-mask controls if/where to make USM allocations resident at the time of allocation. Input value is of the form 0xHSD, where 4-bits of D control device allocations, 4-bits of S control shared allocations, and 4-bits of H control host allocations. Each 4-bit component is holding one of the following values: "0" - then no special residency is forced, "1" - then allocation is made resident at the device of allocation, or "2" - then allocation is made resident on all devices in the context of allocation that have P2P access to the device of allocation. Default is 0x002, i.e. force full residency for device allocations only. |
261
261
|`SYCL_PI_LEVEL_ZERO_USE_NATIVE_USM_MEMCPY2D`| Integer | When set to a positive value enables the use of Level Zero USM 2D memory copy operations. Default is 0. |
0 commit comments