-
Notifications
You must be signed in to change notification settings - Fork 769
[CUDA] P2P buffer/image memory copy #4401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 9 commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
4a71379
Implemented P2P copies for the cuda backend using buffers.
c771093
Switched to using vendor name in P2P info query.
b4a99ff
Merge branch 'intel:sycl' into P2P
JackAKirk b38d5e2
Merge branch 'intel:sycl' into P2P
JackAKirk 6abe9fb
Corrected the scoped context in guessLocalWorkSize to prevent stale c…
a3c251e
Added binary device query for P2P memcpy instead of platform query.
JackAKirk 3cd6911
Corrected formatting.
c384fbe
Corrected Formatting.
27c073d
Placed new PI API's after piTearDown.
60f276d
Renamed piextP2P as piextDevicesSupportP2P.
835e5c4
Made check that devices backends match before P2P query.
9e3ef85
Corrected formating in graph_builder.cpp.
0f36b94
Merge branch 'intel:sycl' into P2P
JackAKirk 0d819c0
Replaced binary device query with device_info call returning a vector…
43f970c
Removed piext Peer functions, replaced them with existing PI copy calls.
9b9a21f
Fixed formatting issues following previous commit.
52cae9f
P2P copies made for 1D image arrays again.
24b240a
Return retError from call to commonEnqueueMemImageNDCopyPeer.
99e58f1
Removed all changes to memory_manager.cpp:
fd15910
Superficial change to make the memory_manager.cpp diff empty.
ccba446
Applied stylistic/general improvements.
0f95aa7
Merge branch 'sycl' into P2P
JackAKirk 7057849
Reverted change to guessLocalWorkSize: unnecessary since #4606.
9d5a84a
Merge branch 'intel:sycl' into P2P
JackAKirk b64484d
Implemented PI_DEVICE_INFO_P2P_READ_DEVICES piDeviceGetInfo case in o…
JackAKirk ebfdb2a
Merge branch 'intel:sycl' into P2P
JackAKirk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need these new API? why wouldn't regular copy API perform P2P copies transparently under the hood?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regular API takes a single pi_queue whereas the Peer API requires a second queue as an argument (principally so that the second context is known). The regular API is an OpenCL interface so cannot be changed. I think that a single API could be used if the new piext*** API was used in the runtime to replace the regular copy API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pi_mem src & dst are created with interfaces that have context, e.g. piextUSMDeviceAlloc or piMemBufferCreate, so backends already know the context of both src and dst, and can act on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I missed that when I checked out pi_mem. There is another reason for providing both queues from e.g. this snippet in
cuda_piextEnqueueMemBufferCopyRectPeer in pi_cuda.cpp line 4054:
We wait on events associated with the source queue and return the event associated with the dest queue.
There were problems associated with returning the event associated with the src queue. Since the contexts can be found from pi_mem I will look at the again and see if there is a way of doing things without the second queue argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that it is fine to only pass a single queue in the PI which acts as the command_queue, and it is fine for either the src_queue or the dst_queue to act as the command queue.
It is my current understanding that all implementation details of implicit peer to peer memory copy calls for buffer memory between devices sharing a SYCL context should be dealt with by the PI, such that the only implicit peer to peer memory copy case that should be dealt with by the runtime (via memory_manager) is the cross context case.
I will implement the peer to peer via a call to piEnqueueMemBufferCopy from memory_manager as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now made the changes that I described above, implementing the peer to peer copy via a call to piEnqueueMemBufferCopy from memory_manager as suggested.