Skip to content

Replace decord with torchcodec #15022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
Closed

Conversation

hmellor
Copy link
Member

@hmellor hmellor commented Mar 18, 2025

As discussed in Slack.

The documentation for torchvision.io.read_video states that PyTorch's video decoding capability will soon be centralised in torchcodec. Therefore, it makes sense to skip the intermediate step of using torchvision.

OpenCV was considered but it can only read videos using a path/url, which meant writing the bytes to disk, which was a deal breaker.

The main caveat of torchcodec at the moment is that it does not distribute ARM64 wheel for Linux, see pytorch/torchcodec#569.

FIX #15011

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation ci/build multi-modality Related to multi-modality (#4194) labels Mar 18, 2025
hmellor added 5 commits March 18, 2025 12:31
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
@hmellor hmellor marked this pull request as ready for review March 18, 2025 13:30
@DarkLight1337 DarkLight1337 requested a review from Isotr0py March 19, 2025 07:53
Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to replace decord with torchcodec on x86 platform, but it's still meaningful to have some benchmark between torchcodec and opencv.

Signed-off-by: Harry Mellor <[email protected]>
@hmellor
Copy link
Member Author

hmellor commented Mar 19, 2025

but it's still meaningful to have some benchmark between torchcodec and opencv.

opencv can only decode videos directly from path/url. Unless we change our APIs this would mean writing the video data to disk first, which makes it a solution we cannot use anyway.

Signed-off-by: Harry Mellor <[email protected]>

Co-authored-by: Isotr0py <[email protected]>
@jeejeelee
Copy link
Collaborator

jeejeelee commented Mar 19, 2025

maybe we can support opencv and torchcodec simultaneously

@hmellor
Copy link
Member Author

hmellor commented Mar 19, 2025

@Isotr0py managed to get opencv working and added a way to have multiple video loaders in #15055, so yeah we could support both

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM!

@Isotr0py
Copy link
Collaborator

Isotr0py commented Mar 19, 2025

Performance comparison between decord, opencv and torchcodec

Tested on Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
Script: https://gist.github.com/Isotr0py/33c9712055b475df7784173f2a0f1de0

Num Frames Decord OpenCV Torchcodec
30 2.171s 3.520s 4.291s
60 2.660s 6.717s 2.007s
120 2.007s 12.874s 2.727s
240 3.351s 24.864s 3.545s
300 3.545s 31.677s 4.237s

OpenCV has poor performance currently because we read frames one by one in iteration.

It's fine to replace decord with torchcodec, and leaving OpenCV as a fallback for aarch64 machine before torchcodec release aarch64 wheel.

@Isotr0py
Copy link
Collaborator

Well, I optimized the OpenCV implementation a bit and it's much faster now, but seems that the loaded frames have numberic difference with decord and torchcodec due to incorrect frame processing.

Script: https://gist.github.com/Isotr0py/33c9712055b475df7784173f2a0f1de0/revisions#diff-7cda49be7ad2d7b5f039fb97386b1095aa5fc85ef2e1c744514a5490de9df530

Num Frames Decord OpenCV Torchcodec
30 2.697s 0.724s 4.178s
60 2.670s 0.857s 1.914s
120 3.156s 0.953s 2.450s
240 3.296s 1.621s 3.366s
300 3.454s 1.826s 3.865s

@hmellor
Copy link
Member Author

hmellor commented Mar 19, 2025

Wow that's quite the improvement, in that case shall we just exclusively use opencv? I can't see a reason to support the other two if they are slower and do the same thing?

@Isotr0py
Copy link
Collaborator

shall we just exclusively use opencv?

I think we can use only OpenCV for video IO which already has best performance and more flexible requirements, but frames extracted from OpenCV would have numeric difference with decord due to compression standard and decoder implementation. (dmlc/decord#108)

@hmellor
Copy link
Member Author

hmellor commented Mar 20, 2025

One advantage of torchcodec is that it works on GPU. I'm not sure if this is the case for OpenCV, but the dedicated video decoding hardware is probably enough of a reason to support both actually

Signed-off-by: Harry Mellor <[email protected]>
@hmellor
Copy link
Member Author

hmellor commented Mar 20, 2025

Closing in favour of #15055

@hmellor hmellor closed this Mar 20, 2025
@hmellor hmellor deleted the remove-decord branch March 26, 2025 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Support more video loader
4 participants