Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[Doc] Add design doc for dynamic linking of device code feature #3210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] Add design doc for dynamic linking of device code feature #3210
Changes from 6 commits
a27bcbf
6fe222d
604909c
df953fc
702e1a4
60054b1
459730b
7f95079
93e202c
b8fb778
850b94f
d176e1c
fbb67d1
7b7aa66
9f2b787
b496ef4
03f6b9f
8ba2c92
7b60419
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some parts of the reqs seem too specific and not general enough. As a variant:
The presented dynamic device code linkage mechanism must:
dlopen
SYCL_EXTERNAL
function references within the SYCL app to their definitions (if found) within any suitable dynamically linked device binary imageSYCL_EXTERNAL
functions across the dynamic linkage boundaries within the device code - taking a pointer, call through a pointer.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this point:
Do you mean some specific SYCL API? Because if dlopen is used it will be just specific usage of dynamically linked code that "embedded into a host shared object by standard SYCL compiler driver invocation".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant some new API, which would load a device library from disk (optionally returning
device_image
), and automatically make symbols defined in the loaded binary image available to SYCL runtime for resolution:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks. I mentioned that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a new API? Can't we just suggest compiling marked with SYCL_EXTERNAL as a regular "fat" .so and then dlopen it? dlopen'ing should trigger device image registration, so such a device image should be available for symbols resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for delay. Yes, you are right, that should be the preferred way. But sometimes there is a need to load x.spv originating from different tools - e.g. OpenCL. It would be UB trying to define kernels in this x.spv and then trying to call them from SYCL, but it should be perfectly OK to link x.spv with .spv originating from SYCL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it very difficult to understand the big picture for the DPC++runtime changes. It would be nice to have an overview up front. Maybe something like this:
Note that I'm not really sure this is how the design will work, but I think this is what you intend. Can you confirm if my overall understanding of the runtime design is accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your understanding is correct, except that the design doesn't actually depend on device images format. I.e. the algorithm of searching won't be changed if device images are pre-compiled native device binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, the symbol-search part of the algorithm can probably work for device images in native code format. However, I think the online linking part will only work for device images in SPIR-V format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wanted to stress that I think it's important to add some description similar to what I suggest to the introduction to the "DPC++ runtime changes" section. I think this will clarify the following points which are currently unclear:
The
SYCL/imported symbols
andSYCL/exported symbols
attributes are used only by the runtime's symbol resolution algorithm. The PI API layer relies on the SPIR-VImport
andExport
linkage annotations, so these must also be present.The linking part of the design works only for device images in SPIR-V format.
The only things added to the cache are fully linked device images. We do not cache device images with
SYCL_EXTERNAL
functions that are compiled and not yet linked.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with you here. Yes, right now we don't have backends that support linking of native device binaries. But we don't say that it is not possible in the future. I think format of device image shouldn't affect design of runtime changes. Most likely when linking of native device binaries is supported, it won't matter for runtime which format device image has, it will just call some PI API for link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smaslov-intel, my understanding is that in OpenCL terminology,
compiled
!=built
.compiled
is an object form, i.e. front-end is passed, but no executable binary was produced yet. I can't find strong proof of that in the spec, but I'm sure that our (Intel) CPU & GPU compilers won't accept native binaries inclLinkProgram
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clCompileProgram.html
Compile does produce "binary", otherwise the above wouldn't be possible.
Who can clarify it exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @bashbaug should be able to clarify this.
In the meantime:
From clCreateKernel:
From clBuildProgram:
From clLinkProgram:
From clCompileProgram:
It seems to me that intent was that
clCompileProgram
does not generate an executable, but only creates some intermediate object file insteadThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
From clGetProgramInfo(https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clGetProgramInfo.html) desc:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we somehow query OpenCL about what is in the binary such that we could properly control it's linking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that we add a sub-section about this
PI API
below: this API is important, as this is a part of interface between plugins and DPC++ RT and we don't even have its name listed here.The sub-section could describe the API, its behavior (whether we assume that it is capable to link native binaries regardless of device or that it is some optional capability which should be checked before usage), possible implementation/limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, please see lines 236-259.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a
unordered_multimap
to handle use-case of kernels with the same name in different objects?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think use-case when kernels have the same name is not really legal. SYCL 2020 spec says "In the case of the shared-source compilation model, the kernels have to be uniquely identified by both host and device compiler.". So if two kernels (even if they are in different objects) have the same name, they don't really uniquely identified, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have the following use-case:
Is this use-case handled by SYCL 2020 spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that SYCL 2020 (or other) spec say something about such use cases, especially when libraries are involved. However, looking at this comment #3210 (comment) , I think this case is illegal. @gmlueck , could you please confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is illegal because the spec says this:
The spec could certainly be more clear about this, though. I'll propose updated wording to the SYCL 2020 spec clarifying that a kernel name must be unique across the entire application.
Note that it is legal for two translation units (or two libraries) to call the same kernel, in which case both instances of the kernel will have the same name. For example, consider a named kernel object that is called from two different libraries:
In such a case, both libraries will have a definition of the kernel named
MyKernel
. The design will need to support cases like this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, there can be two cases when
SYCL_EXTERNAL
-only images are cached in in-memory cache:main
image, i.e. after it is linked with all its dependencies (the case that you described in your suggestion)SYCL_EXTERNAL
-only images in compiled state. It can be done when some "main" image requested someSYCL_EXTERNAL
-only image as dependency, we created a program forSYCL_EXTERNAL
-only image, compiled it and stored in cache, so we don't have to do it again when some othermain
requests the sameSYCL_EXTERNAL
image. (the case that I tried to describe here). The idea was taken from existing cache for standard device libraries (seellvm/sycl/source/detail/context_impl.hpp
Line 177 in 86716c5
Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current approach assume storing final binary after linking it means:
main
program;main
images in persistent cache are invalidated and JIT is initiated.May be it is reasonable to store library
images
in cache separately frommain
image? It can be done both for in-memory and persistent or only for persistent cache depending onweight
of link operation for binary programs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean compile to binary "main" image and "library" image and store them on disk separately? It sounds reasonable (the real dynamic libraries actually work this way), however I don't think we have devices that support linking of native device binaries yet, so right now It doesn't seem possible.