You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$cat sycl_supercontext.cpp
#include <CL/sycl.hpp>
#include <iostream>
#include <vector>
#include <numeric>
/* Send data in a circular manner in all devices of a GPU platform */
constexpr int N=10;
int main() {
sycl::gpu_selector device_selector;
sycl::platform platform(device_selector);
std::vector<sycl::device> gpu_devices = platform.get_devices() ;
const int N_devices = gpu_devices.size() ;
if ( gpu_devices.size() <= 1 ) {
std::cerr << "Warning, only one device on this platform."
<< " This test is not testing P2P" << std::endl;
}
sycl::context super_ctxt = sycl::context(gpu_devices);
std::vector<sycl::queue> sycl_queues;
std::vector<float *> sycl_ptr;
// Alocate memory to each device. Each queue share the same context
for (int i = 0; i < N_devices; i++) {
sycl_queues.push_back(sycl::queue(super_ctxt, gpu_devices[i]) );
float * ptr = sycl::malloc_device<float>(N, sycl_queues[i]);
sycl_ptr.push_back(ptr);
}
// Allocate host data, and set the value
std::vector<float> host_ptr(N);
std::iota(host_ptr.begin(), host_ptr.end(), 0);
// Copy the data to the first device, and 0ed the host data
sycl_queues[0].memcpy(sycl_ptr[0], host_ptr.data(), N*sizeof(float)).wait();
std::memset(host_ptr.data(), 0, N*sizeof(float));
// The circular exchange
for (int i = 0; i < N_devices; i++) {
float * src = sycl_ptr[i];
float * dst = sycl_ptr[ ( i+1 ) % N_devices ];
sycl_queues[i].memcpy(dst, src, sizeof(float)*N).wait();
}
// Copy back data and check for correctness
sycl_queues[0].memcpy(host_ptr.data(), sycl_ptr[0], N*sizeof(float)).wait();
for( int i = 0; i < N; i++) {
assert( host_ptr[i] == i );
}
return 0;
}
$dpcpp sycl_supercontext.cpp
$./a.out
Segmentation fault (core dumped)
The OpenCL command who segfault is clEnqueueMemcpyINTEL or zeCommandListAppendMemoryCopy
Environment (please complete the following information):
In https://intel.github.io/llvm-docs/MultiTileCardWithLevelZero.html#memory, On the malloc_device you have this sentence: Allocation can only be accessed by the specified device but not by other devices in the context nor by host.
In malloc_shared I saw No explicit copy is necessary for synchronizing between the host and the device, but it is needed for other devices in the context.
This explains why the code I put below is not working, indeed it's using alloc_device. When using malloc_shared the code work.
It looks like the Specification is a little vague about the behavior of alloc_device in the case of super-context. If using a P2P access in property list will allow another device in the same context to use this memory?
Describe the bug
Memcopy between devices sharing the same context doesn't work. This bug is only triggered by a platform that contains multiple devices.
This is not quite a common pattern, but this will be in a near future.
Bug confirm with OpenCL and LevelZero backend
To Reproduce
The OpenCL command who segfault is
clEnqueueMemcpyINTEL
orzeCommandListAppendMemoryCopy
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: