-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Replace decord
with torchcodec
#15022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Harry Mellor <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine to replace decord
with torchcodec
on x86 platform, but it's still meaningful to have some benchmark between torchcodec
and opencv
.
Signed-off-by: Harry Mellor <[email protected]>
|
Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]>
maybe we can support opencv and torchcodec simultaneously |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM!
Performance comparison between
|
Num Frames | Decord | OpenCV | Torchcodec |
---|---|---|---|
30 | 2.171s | 3.520s | 4.291s |
60 | 2.660s | 6.717s | 2.007s |
120 | 2.007s | 12.874s | 2.727s |
240 | 3.351s | 24.864s | 3.545s |
300 | 3.545s | 31.677s | 4.237s |
OpenCV
has poor performance currently because we read frames one by one in iteration.
It's fine to replace decord
with torchcodec
, and leaving OpenCV as a fallback for aarch64 machine before torchcodec
release aarch64 wheel.
Well, I optimized the OpenCV implementation a bit and it's much faster now, but seems that the loaded frames have numberic difference with
|
Wow that's quite the improvement, in that case shall we just exclusively use opencv? I can't see a reason to support the other two if they are slower and do the same thing? |
I think we can use only OpenCV for video IO which already has best performance and more flexible requirements, but frames extracted from OpenCV would have numeric difference with |
One advantage of |
Signed-off-by: Harry Mellor <[email protected]>
Closing in favour of #15055 |
As discussed in Slack.
The documentation for
torchvision.io.read_video
states that PyTorch's video decoding capability will soon be centralised intorchcodec
. Therefore, it makes sense to skip the intermediate step of usingtorchvision
.OpenCV was considered but it can only read videos using a path/url, which meant writing the bytes to disk, which was a deal breaker.
The main caveat of
torchcodec
at the moment is that it does not distribute ARM64 wheel for Linux, see pytorch/torchcodec#569.FIX #15011