-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Enable Intel GPU support in torchcodec (pytorch xpu backend device) #559
Comments
@dvrogozh, thanks for the interest in TorchCodec! Supporting multiple accelerators is exciting and something that will benefit our users. At the moment, I think the best way forward is to host this XPU support in a separate repo as a dynamically loadable module. What your PR clearly demonstrates is that TorchCodec is not quite ready for that - it's close, but not quite there. That is, you needed to change our In order to get there, I think we need to take three steps:
I'm happy to work with you on the changes we'll need to make to the TorchCodec repo to enable a separate XPU device repo. Let me know if that's work you're interested in tackling! |
@scotts : thank you for feedback. I will be glad to work with you on the part to prepare TorchCodec to support multiple accelerators - those steps you are highlighting in1-3) and others. Let's try to take next steps next week taking and refactoring pieces from the PR I provided. However, I do disagree at the point that other accelerator's specific code should be placed in a separate repository. We had this experience at Intel with few other projects, including IPEX (torch plugin for intel GPU support) and it's clearly understood that this path has a number of issues. Such external project goes out of sync with upstream project, becomes difficult to maintain and most importantly is not appreciated by the end users who considers additional steps to install plugin stuff as unnecessary complexity. Ultimately we are looking for the way to bring Intel GPUs support right into upstream TorchCodec itself, in a similar way as it's now added into upstream pytorch via XPU backend. We (Intel) will backup this effort with Intel-specific aspects of CI and packaging. I hope we can discuss what will be the best way towards achieving this goal. Can you, please, share you opinion on feasibility of this approach? I believe the 1-3) steps you shared above would be needed anyway. The part which probably needs discussion is a choice of ffmpeg backend since there are few available with their pros and cons. And there might be other technical requirements which you might wish to imply for the multi-accelerator support. |
@dvrogozh, as your PR is quite large, I think it may be easiest to start fresh. I'd also like to create a new issue focused on making our device interface generic and dynamically loadable. Let me know if you feel you have enough of an understanding of what needs to happen to start making progress, or if it would be valuable for me to sketch things out more. I agree that a separate repo adds friction for users who want to use TorchCodec and XPU acceleration. However, we don't have much signal right now how much our users want XPU acceleration. I think that a separate repo is the best way to get that signal: if a lot of users start complaining about that friction, then we know it's wanted! Once we have clear demand, we can work on the best way to upstream XPU support directly in TorchCodec. |
@scotts : I am working on the 1st item from your list (device agnostic |
Filed:
@scotts : PR filed. Please, take a look: |
That's an RFC issue for the following PR:
The PR enables Intel GPU support in torchcodec by adding ffmpeg-vaapi decoding and connecting it with pytorch XPU device backend.
There are few items I would like to bring to discussion in this RFC issue.
Selection of ffmpeg backend. There are few which are available for Intel GPU:
None of the above media backends are preferable in the sense of easier context/memory sharing. Intel media APIs, drivers and libraries are not directly intersect with Intel compute stack. In particular, Intel compute's Unified Shared Memory pointers are note recognized by media APIs and can not be accepted directly. This means that memory sharing between media and compute must be done (at the moment) via lower level APIs such as DMA fds on Linux and NT handles on Windows. This gives OS specific dependency.
I suggest to consider ffmpeg-vaapi for Linux and ffmpeg-dx12 for Windows to enable Intel GPUs. These are backends based on Intel media driver APIs. ffmpeg-qsv is based on a higher level library (libvpl) and does not allow to avoid vaapi/dx dependency since we in any case will need to use these APIs to get to the underlying surface memory. I also think that ffmpeg-vaapi is used by AMD GPUs, thus adding support of VAAPI can help here as well. (See also #444)
ffmpeg-vulkan might be interesting due to eventual cross-vendor capabilities. However 1) media support in vulkan is recently new and I am not sure that all required features are available (for example - color space conversion to RGBA), 2) media support in vulkan driver is a community effort for Intel GPUs rather than Intel effort and has different implementation comparing to Intel media driver. Overall, I think ffmpeg-vulkan might be an interesting next stepping after enabling torchcodec with ffmpeg-vaapi.
Selection of color conversion algorithm. Following current torchcodec architecture color conversion of decoded output (typically NV12) to RGB24 is needed. In the current implementation I chose to just implement color conversion directly on VAAPI since that's fairly quick and trivial. I believe that's good enough for the first implementation. I do suggest to consider using ffmpeg-vaapi for color space conversion going forward, but this will require additional effort on top of the PR I currently provide. Couple notes on conversion:
torcheval
for checks in tests (changed only for Intel GPU).Planned changes.
CC: @scotts, @NicolasHug, @EikanWang
The text was updated successfully, but these errors were encountered: