Skip to content

PI object created from native handles take ownership #1516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
steffenlarsen opened this issue Apr 14, 2020 · 19 comments
Closed

PI object created from native handles take ownership #1516

steffenlarsen opened this issue Apr 14, 2020 · 19 comments
Assignees

Comments

@steffenlarsen
Copy link
Contributor

#1332 introduces a number of PI API functions for creating various PI objects from native objects (piext*CreateWithNativeHandle.) However, given that the CUDA backend is unable to share a reference count between the native object and the native object, like the OpenCL backend does, two solutions were proposed:

  1. The user keeps ownership of the native object, which would require the CUDA backend to have a special case for PI objects that would not destroy the native object upon its own destruction. This would require the user to keep the native object alive while using the corresponding PI objects.

  2. The created PI object takes ownership of the native object, which doesn't require change to neither of the backends, but can be implemented by not having these functions retain the native objects in the OpenCL BE. This requires that the user keeps the PI object alive while the native object is alive.

Of these options the second was chosen. With this implementation, in order to adhere to the requirement from the SYCL 1.2.1 specification that interoperability constructor must retain the OpenCL object, cl*Retain is called in those constructors.

Currently these constructors are the only ones using the piext*CreateWithNativeHandle at the moment, though they are expected to be a somewhat directly called by a future make function similar to the new get_native functions. At that point, the lifetime choices made for native objects with piext*CreateWithNativeHandle becomes exposed to the user.

The point of this issue is to review these choices and make sure we reach a solution that we are happy with before this is directly exposed to the SYCL users. See discussion on the subject here: #1332 (comment).

@smaslov-intel
Copy link
Contributor

in order to adhere to the requirement from the SYCL 1.2.1 specification that interoperability constructor must retain the OpenCL object, cl*Retain is called in those constructors.

I think it is important that the PI objects created with the piext*CreateWithNativeHandle are retained, i.e. one does not need to call pi*Retain after creating. Can we move the cl*Retain into OpenCL plugins? And have other plugins follow the rule: created PI object is retained, which means it's reference counter set to 1 in the plugins where reference counter is not shared with native RT.

@steffenlarsen
Copy link
Contributor Author

in order to adhere to the requirement from the SYCL 1.2.1 specification that interoperability constructor must retain the OpenCL object, cl*Retain is called in those constructors.

I think it is important that the PI objects created with the piext*CreateWithNativeHandle are retained, i.e. one does not need to call pi*Retain after creating. Can we move the cl*Retain into OpenCL plugins? And have other plugins follow the rule: created PI object is retained, which means it's reference counter set to 1 in the plugins where reference counter is not shared with native RT.

My primary concern is that this would make a difference in how native objects are handled between the OpenCL BE and the CUDA BE.

For the sake of argument, assume we do as you suggest and have +1 reference count on the created object. Say now that we have the make function from the generalization proposal that creates a SYCL object from a native object, which would primarily use the new piext*CreateWithNativeHandle PI API function.

Now we imagine a very simple use case where a user creates a SYCL object from a native handle, uses the SYCL object, and wants to clean up after it. Using the CUDA backend this would be done as:

  1. Create a SYCL object from the CUDA object using the make function. The underlying PI object will have a reference count of 1.
  2. Do some operation with the created SYCL object.
  3. Let the SYCL object go out of scope, in which case it will release the PI object reducing the reference count to 0 at which point it will release both the PI object and the CUDA object given at creation.

However, for the OpenCL BE it would require something along the lines of:

  1. Create a SYCL object from the OpenCL object using the make function. Assuming that this object has only been created and has not been used by any operation the reference count of both the PI object and the given OpenCL object is now 2.
  2. Do some operation with the created SYCL object.
  3. Let the SYCL object go out of scope, in which case it will release the PI object reducing the reference count to 1.
  4. Call cl*Release on the OpenCL object reducing its reference count to 0 at which point it is destroyed.

This is the primary reason for the two options presented originally, specifically;

  1. The user keeps ownership of the native object, which would require the CUDA backend to have a special case for PI objects that would not destroy the native object upon its own destruction. This would require the user to keep the native object alive while using the corresponding PI objects.
  2. The created PI object takes ownership of the native object, which doesn't require change to neither of the backends, but can be implemented by not having these functions retain the native objects in the OpenCL BE. This requires that the user keeps the PI object alive while the native object is alive.

Since we've elected to implement the latter we need to make sure that step 4. of the OpenCL example above need not be done, as it would have taken ownership. Of course, if the user had done their own retains they should also do the same amount of releases, but they shouldn't have to do a release of the object from the creation of it.

What you suggest is probably more in line with option 1. however, in which case we would have to prevent the CUDA backend from destroying CUDA objects used in a make call, adding a 4th step to the CUDA example having the user destroy the CUDA object explicitly after the SYCL object has been destroyed. The primary problem with this is however that the user needs to make sure not to destroy the CUDA object before the PI object created from it has been destroyed, which is not always easily visible as for example a SYCL queue may keep a SYCL context alive, even after the user has lost all references to this SYCL context.

@steffenlarsen
Copy link
Contributor Author

Ping @smaslov-intel.

@romanovvlad may also have some input regarding this?

@smaslov-intel
Copy link
Contributor

Of these options the second was chosen. With this implementation, in order to adhere to the requirement from the SYCL 1.2.1 specification that interoperability constructor must retain the OpenCL object, cl*Retain is called in those constructors.

The retain is called for OpenCL BE and so there needs to be step 4 for the release, right?
What is done on construction from the CUDA BE?

Since we've elected to implement the latter we need to make sure that step 4. of the OpenCL example above need not be done

Can we release the OpenCL native handle as many times as needed for its reference counter go to 0 at the destruction of the PI object owning that native handle?

@steffenlarsen
Copy link
Contributor Author

The retain is called for OpenCL BE and so there needs to be step 4 for the release, right?
What is done on construction from the CUDA BE?

For the OpenCL interoperabilty constructor that takes OpenCL objects that is the case, but those constructors cannot be used with the CUDA backend. That is why the cl*Retain was moved to these constructors, as it allows the generalized maker-functions to adopt a more general approach.

Can we release the OpenCL native handle as many times as needed for its reference counter go to 0 at the destruction of the PI object owning that native handle?

In principle we could probably do this as the reference count is exposed through the info queries, though the specification, in this case taken from clGetContextInfo, specifies that

The reference count returned should be considered immediately stale. It is unsuitable for general use in applications. This feature is provided for identifying memory leaks.

Additionally, which is likely part of why the previous quote is in the specification, we could have a case where the user have passed the OpenCL object to an OpenCL operation that retained the object, and while the operation is running the user creates a SYCL object from the object, using the future make function. If we were to make sure that the reference count is exactly 1 once the SYCL object is created, the OpenCL operation could finish and destroy the object for good.

However, my reasoning behind it being enough to just not retain the OpenCL object when creating a SYCL object is that the SYCL object gets the responsibility of the single reference that was made upon creation of the OpenCL object. Effectively this means that a user that has called cl*Retain N times on an OpenCL object that is owned by a SYCL object must call cl*Release N times on that object as well, rather than N+1 times. Since CUDA doesn't have retain functions the rules are the same, where we always have N=0.

@romanovvlad
Copy link
Contributor

romanovvlad commented May 8, 2020

Trying to understand the topic...
Do I understand correctly that generalization proposal requires that OpenCL native handles are retained and released in the same way like in SYCL1.2.1, but says nothing about non-OpenCL backends?
I think this is a problem as requires us and users to do different things for different backends which can be confusing. Specifically behavior of make factories are different depending on backends:

  1. For CUDA a user give ownership of a handle to a SYCL object and have to be super safe to not release the handle passed.
  2. For OpenCL a user shares ownership and can continue using it safely.

Probably it would be better to define a common behavior that can be supported by all backends:
a user gives ownership of a native handle to a SYCL object and cannot use the handle while the SYCL object is alive(similar to the host ptr argument of the sycl::buffer.
With current behavior I believe retain and release function(see [UPD]) in CUDA BE should be NOPs as there is no reference counting for native CUDA handles so it doesn't make any sense to emulate it.
Please, correct me if I'm wrong.

[UPD] release function is still needed for CUDA BE in current implementation.

@steffenlarsen
Copy link
Contributor Author

Do I understand correctly that generalization proposal requires that OpenCL native handles are retained and released in the same way like in SYCL1.2.1, but says nothing about non-OpenCL backends?

The only thing I see in the proposal is regarding the get functions here, so I'm going to assume that we have free-reign when it comes to make functions. The fact that get has these semantics may be another problem however, but that's besides this issue.

I think this is a problem as requires us and users to do different things for different backends which can be confusing. Specifically behavior of make factories are different depending on backends:

  • For CUDA a user give ownership of a handle to a SYCL object and have to be super safe to not release the handle passed.
  • For OpenCL a user shares ownership and can continue using it safely.

Probably it would be better to define a common behavior that can be supported by all backends:
a user gives ownership of a native handle to a SYCL object and cannot use the handle while the SYCL object is alive(similar to the host ptr argument of the sycl::buffer.

This is what we currently have, that is the SYCL object takes some ownership of the native object. This effectively means that the users will only have to release a native object if they have retained it explicitly. For OpenCL this means that the SYCL object simply doesn't retain the native object upon creation, so it takes responsibility of releasing the one reference from creation, but any additional retains (either by the user or from e.g. library calls) are not the responsibility of the SYCL object. Since CUDA doesn't have reference counting on native objects the user cannot retain, so if a SYCL object is created from a native object it will also destroy it. I believe this makes the ownership issue somewhat more consistent between the backends.

With current behavior I believe retain and release function(see [UPD]) in CUDA BE should be NOPs as there is no reference counting for native CUDA handles so it doesn't make any sense to emulate it.

In the CUDA backend the retain and release is done on the PI object. Unlike with the OpenCL backend, where the PI objects can share the reference counts with the native objects, CUDA objects are not reference counted. Either way, the user should not have access to any sort of retain or release in the CUDA BE.

@gmlueck
Copy link
Contributor

gmlueck commented Jul 10, 2020

@smaslov-intel recently pointed me to this issue and asked me to comment. I'm a member of the SYCL language committee, and I think I have some understanding of the backend generalization proposal as it is specified in SYCL 2020 provisional. The SYC spec defines a set of common APIs for interoperating between SYCL objects and native backend objects. However, the exact semantics of these APIs can be different for each backend. In particular, the lifetime guarantees about the native objects encapsulated by a SYCL object are not specified in SYCL 2020. Instead, each backend should have its own "backend specification document" which documents these details.

For example, a SYCL application can create a SYCL object from a native object via one of the make_OBJECT() free functions, which takes the native object as a parameter. However, the SYCL spec does not discuss the lifetime requirements of the native object after this is done. Instead, this must be documented by each backend's "backend specification".

Likewise, a SYCL application can get an underlying native object from a SYCL object by calling one of the get_native() functions, which takes the SYCL object as a parameter. However, the SYCL spec does not discuss the lifetime of the returned native object. Again, this must be documented by each backend's "backend specification".

All mention of reference counting of native objects has been removed from the SYCL 2020 spec. Therefore, it is not necessary for the underlying native objects to even support the concept of reference counting.

The OpenCL "backend specification" (which is an appendix in SYCL 2020) does rely on reference counting of native objects, but this is specific to the OpenCL backend. Other backends need not do this. For OpenCL, make_OBJECT() does a retain on the native object, so the caller could legally release the native object immediately after calling make_OBJECT() and this would have no effect on the newly created SYCL object.

Other backends need to decide their own lifetime rules. I'm not that familiar with CUDA, but I know that Level 0 does not support reference counted objects. (From the conversation above, it sounds like CUDA also does not support reference counted objects.) Therefore, it is not possible for the Level 0 version of make_OBJECT() to behave the same as OpenCL. I think there are two possibilities:

  1. Calling make_OBJECT() assumes ownership of the native object, and the SYCL runtime will destroy the native object when the SYCL object is destroyed.

  2. Calling make_OBJECT() does not assume ownership. Instead, the caller must guarantee that it keeps the native object alive at least for the duration of the SYCL object and for the duration of any objects copied from that SYCL object. This includes any SYCL objects that indirectly contain the native object. For example, queue contains a context, so if the context was created from a native object then the native object must live at least as long as any queue that contains it.

Personally, I think the rules for (2) might get confusing, especially when one SYCL object contains another SYCL object. Therefore, (1) might be less confusing for customers.

Another possibility would be to provide a parameter that lets the user choose between these two behaviors. For example, the Level 0 version of make_OBJECT() could provide an additional bool parameter that chooses between behavior (1) vs. (2). However, this would require a change to the SYCL spec, since currently SYCL 2020 provisional requires all backends to have the same interface to the make_OBJECT() functions. The spec is provisional at this point, though, so changes like this could be made if there is a motivation.

@steffenlarsen
Copy link
Contributor Author

Thank you for your insight, @gmlueck !

My biggest concern here is how transparent the necessary lifetime of the native handle is, because if the user keeps sole ownership of a native handle they must make sure that it's not used by any SYCL object still alive when they destroy the resource. Expressing this to the user may become problematic because they only know of the reference that they're given to the SYCL object they created with the native handle, but even after that is gone there may be other SYCL objects keeping that exact SYCL object alive. Tracking that lifetime without having a view of the underlying runtime is not trivial and can easily lead to very confusing errors that will be extremely difficult to debug.

On the other hand, relinquishing all user ownership of the native object when creating a SYCL object means that the user will not have to worry about the native handle anymore and all they have to do is make sure that the created SYCL object stays alive for the time that they need the native handle.

Granted there are problems with both and I can imagine there are use-cases speaking for/against one or the other. However, I believe you are right in that SYCL 2020 makes this backend-specific, so maybe it's time to do let OpenCL be OpenCL and let the internal reference count do its job.

Another possibility would be to provide a parameter that lets the user choose between these two behaviors. For example, the Level 0 version of make_OBJECT() could provide an additional bool parameter that chooses between behavior (1) vs. (2). However, this would require a change to the SYCL spec, since currently SYCL 2020 provisional requires all backends to have the same interface to the make_OBJECT() functions. The spec is provisional at this point, though, so changes like this could be made if there is a motivation.

That may not be a bad solution to this. Maybe have something similar to property_list with backend-specific options for different aspects of this? Chances are that when we get further into this there will be some options that are valid for all backends as well.

@gmlueck
Copy link
Contributor

gmlueck commented Jul 13, 2020

Expressing this to the user may become problematic because they only know of the reference that they're given to the SYCL object they created with the native handle, but even after that is gone there may be other SYCL objects keeping that exact SYCL object alive.

I agree with you here, which is why I suggest option (1) in my list above.

That may not be a bad solution to this. Maybe have something similar to property_list with backend-specific options for different aspects of this? Chances are that when we get further into this there will be some options that are valid for all backends as well.

Adding a property list is not a bad idea. Another option is to just remove the interop API prototypes from the SYCL 2020 spec and require each backend to define these. I'm not sure how useful it is to require each backend to have the same API if each backend will have a different semantic.

Regarding the L0 backend specifically, do others think it is useful to provide a parameter of some sort to choose between my options (1) and (2)? If there are no use cases for option (2), there's no sense in defining a parameter.

@steffenlarsen
Copy link
Contributor Author

I agree with you here, which is why I suggest option (1) in my list above.

Ah, apologies. I somehow convinced myself it was referring to (1) in the issue. Then I think we're on the same page.

@smaslov-intel
Copy link
Contributor

Currently these constructors are the only ones using the piext*CreateWithNativeHandle at the moment, though they are expected to be a somewhat directly called by a future make function similar to the new get_native functions. At that point, the lifetime choices made for native objects with piext*CreateWithNativeHandle becomes exposed to the user.

Since this issue was created I already added some of make_OBJECT() functions for both OpenCL and level-Zero: https://github.com/intel/llvm/blob/sycl/sycl/source/backend/opencl.cpp and https://github.com/intel/llvm/blob/sycl/sycl/source/backend/level_zero.cpp.

I believe what's implemented matches the agreement between @steffenlarsen and @gmlueck that the SYCL object takes ownership (OpenCL plugin does not retain native objects) and that @smaslov-intel wanted that no explicit call to pi*Retain is needed after piext*CreateWithNativeHandle (Level-Zero starts reference counting at 1). Please correct me if I am missing something.

@gmlueck
Copy link
Contributor

gmlueck commented Jul 14, 2020

I believe what's implemented matches the agreement between @steffenlarsen and @gmlueck that the SYCL object takes ownership (OpenCL plugin does not retain native objects) and that @smaslov-intel wanted that no explicit call to piRetain is needed after piextCreateWithNativeHandle (Level-Zero starts reference counting at 1). Please correct me if I am missing something.

The discussion about calling pi*Retain() seems only marginally related to the issue here. The real question is whether the L0 implementation eventually calls ze*Destroy() for the native object when the SYCL object is destroyed. Likewise, the OpenCL implementation should call clRetain*() on the native object when creating the SYCL object.

I skimmed through the L0 code, and I didn't see any problem except that L0 never calls zeModuleDestroy() for any SYCL program object. I'm fixing this bug as part of a different change.

I don't think the OpenCL code is correct. According to the SYCL 2020 OpenCL backend specification, creating a SYCL object from a native object should increment the reference count on the underlying OpenCL object (i.e. call clRetain*()):

SYCL runtime classes which encapsulate an OpenCL opaque type such as SYCL context or SYCL queue must
provide an interoperability constructor taking an instance of the OpenCL opaque type. These constructors must
retain that instance to increase the reference count of the OpenCL resource.

However, I don't see where that is happening. For example, this is from "sycl/plugins/opencl/pi_opencl.cpp":

pi_result piextProgramCreateWithNativeHandle(pi_native_handle nativeHandle,
                                             pi_context,
                                             pi_program *piProgram) {
  assert(piProgram != nullptr);
  *piProgram = reinterpret_cast<pi_program>(nativeHandle);
  return PI_SUCCESS;
}

Is clRetainProgram() called from some other place?

@smaslov-intel
Copy link
Contributor

L0 never calls zeModuleDestroy() for any SYCL program object.

Right, this is a known issue, and there is a TODO asking to call the zeModuleDestroy although it is currently asking to do so only for non-interop midules, while we should do so for any module, assuming the ownership.

provide an interoperability constructor taking an instance of the OpenCL opaque type. These constructors must
retain that instance to increase the reference count of the OpenCL resource

This is talking about constructors not the make_OBJECT() factory functions. The clRetainProgram is called from program::program(const context &context, cl_program clProgram). Honestly, I thought these interoperability constructors would be gone from SYCL 2020 in favor of make_OBJECT(), which wouldn't have such requirement. If this requirement to retain OpenCL native objects stays, how are we able to take the ownership?

@gmlueck
Copy link
Contributor

gmlueck commented Jul 14, 2020

This is talking about constructors not the make_OBJECT() factory functions. The clRetainProgram is called from program::program(const context &context, cl_program clProgram). Honestly, I thought these interoperability constructors would be gone from SYCL 2020 in favor of make_OBJECT(), which wouldn't have such requirement.

I agree that the wording should be clearer, but I believe this is talking about the make_OBJECT() factory functions. The true constructors taking a native object are not part of SYCL 2020 provisional. Further down in the "OpenCL backend specification" it describes the make_OBJECT() factory functions "constructing" the SYCL objects. For example, this is the description of make_context():

Constructs a SYCL context instance from an OpenCL cl_context in accordance with the requirements described in 4.5.2.

If this requirement to retain OpenCL native objects stays, how are we able to take the ownership?

The L0 and the OpenCL backends have different semantics for make_OBJECT(). The L0 backend "takes ownership" of the native object, but the OpenCL backend does a "retain" on the native object.

@smaslov-intel
Copy link
Contributor

Why can't we go with "takes ownership" universally, i.e. remove the requirement to retain native OpenCL object from SYCL 2020 ?

@gmlueck
Copy link
Contributor

gmlueck commented Jul 14, 2020

I have two answers:

  1. If we try to come up with an interop semantic that is exactly the same for all possible backends, we'll end up with a "lowest common denominator" interface that won't make anyone happy. The "retain" reference count is a very natural API for OpenCL, so it makes sense to use it for OpenCL interop. How would OpenCL users feel if we told them we cannot support reference counts for OpenCL interop only because some other backend can't support them?

  2. I'm not sure what "takes ownership" even means for OpenCL. Since OpenCL objects are reference counted, there's no single "owner" for an object. Instead, objects are always shared. If we wanted to support "takes ownership" in OpenCL interop, we'd have to somehow force the reference count to be 1 in the make_OBJECT() API. I don't think there's any way to set the reference count to a specific value in OpenCL, so I don't think it would be possible to "take ownership".

@steffenlarsen
Copy link
Contributor Author

At the time of implementing the interoperability operations, focus was on trying to make the PI interoperability functions as consistent between backends as possible. For make_OBJECT() his was - and I believe still is - accomplished by having the created PI object "take ownership of one reference", that is for reference counted native object like OpenCL it would, at the end of the SYCL object's lifetime, decrement the reference count of the OpenCL object one more time than the number of times it incremented it. This allows the native object to, for example, be used by other libraries without having lifetime problems. For CUDA this was done similarly, but since CUDA objects aren't reference counted, it would only ever release the native object once. This was the best attempt at making it somewhat consistent.

That said, I agree with @gmlueck that given the freedom offered by it being backend-specific behaviour I think the PI API should allow the same freedom. After all, it is interop so users are likely only using one of the APIs anyway, and if not they should understand the differences between them.

@smaslov-intel
Copy link
Contributor

This can be closed as we had settled with a solution, which is different for different backends.
OpenCL: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:interfacing-with-opencl

The lifetime of a SYCL runtime class that encapsulates an OpenCL opaque type and the instance of that opaque type retrieved via the get_native() free function are not tied in either direction given correct usage of OpenCL reference counting. For example if a user were to retrieve a cl_command_queue instance from a SYCL queue instance and then immediately destroy the SYCL queue instance, the cl_command_queue instance is still valid. Or if a user were to construct a SYCL queue instance from a cl_command_queue instance and then immediately release the cl_command_queue instance, the SYCL queue instance is still valid.

Level Zero: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/LevelZeroBackend/LevelZeroBackend.md#44-level-zero-handles-ownership-and-thread-safety

By default, the ownership is transferred to the SYCL runtime, but some interoparability API supports overriding this behavior and keep the ownership in the application.

I am not aware of any CUDA backend specification, and one should probably be created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants