-
Notifications
You must be signed in to change notification settings - Fork 30
Support all audio formats by converting to FLTP #556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
const UniqueAVFrame& avFrame = (sourceSampleFormat != desiredSampleFormat) | ||
? convertedAVFrame | ||
: avFrameStream.avFrame; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above: I wasn't able to use the call to convertAudioAVFrameSampleFormat
within the ternary expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative would be to define something like:
UniqueAVFrame maybeConvertAVFrame(
UniqueAVFrame frame,
AVSampleFormat source,
AVSampleFormat desired) {
if (source != desired) {
return convertAudioAVFrameSampleFormat(frame, source, desired);
else {
return frame;
}
}
And then called with:
UniqueAVFrame avFrame = maybeConvertAVFrame(
avFrameStream.avFrame,
sourceSampleFormat,
desiredSampleFormat);
That's not necessarily better because there's more ceremony in declaring and defining a new function (although it does not need to be a class member function) and then it's not necessarily obvious what's going on with the sample format comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just realized that convertAudioAVFrameSampleFormat()
already has that exact signature. I think we could simplify things a lot if convertAudioAVFrameSampleFormat()
became maybe ConvertAudioAVFrameSampleFormat()
(or a better name?) and only did the conversion if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to make it work.
The function cannot take a UniqueAVFrame avFrame
as input, it has to be a ref (or const ref).
Then it has to return either avFrame
(which is a ref) or convertedAVFrame
which is a UniqueAVFrame
allocated within that function.
I tried a few things, but nothing worked. It's possible I'm doing something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example the following:
diff --git a/src/torchcodec/decoders/_core/VideoDecoder.cpp b/src/torchcodec/decoders/_core/VideoDecoder.cpp
index 9871db6..3003792 100644
--- a/src/torchcodec/decoders/_core/VideoDecoder.cpp
+++ b/src/torchcodec/decoders/_core/VideoDecoder.cpp
@@ -1361,14 +1361,8 @@ void VideoDecoder::convertAudioAVFrameToFrameOutputOnCPU(
static_cast<AVSampleFormat>(avFrameStream.avFrame->format);
AVSampleFormat desiredSampleFormat = AV_SAMPLE_FMT_FLTP;
- UniqueAVFrame convertedAVFrame;
- if (sourceSampleFormat != desiredSampleFormat) {
- convertedAVFrame = convertAudioAVFrameSampleFormat(
- avFrameStream.avFrame, sourceSampleFormat, desiredSampleFormat);
- }
- const UniqueAVFrame& avFrame = (sourceSampleFormat != desiredSampleFormat)
- ? convertedAVFrame
- : avFrameStream.avFrame;
+ const UniqueAVFrame& avFrame = convertAudioAVFrameSampleFormat(
+ avFrameStream.avFrame, sourceSampleFormat, desiredSampleFormat);
AVSampleFormat format = static_cast<AVSampleFormat>(avFrame->format);
TORCH_CHECK(
@@ -1397,9 +1391,11 @@ void VideoDecoder::convertAudioAVFrameToFrameOutputOnCPU(
UniqueAVFrame VideoDecoder::convertAudioAVFrameSampleFormat(
const UniqueAVFrame& avFrame,
AVSampleFormat sourceSampleFormat,
- AVSampleFormat desiredSampleFormat
+ AVSampleFormat desiredSampleFormat) {
+ if (sourceSampleFormat == desiredSampleFormat) {
+ return avFrame;
+ }
-) {
auto& streamInfo = streamInfos_[activeStreamIndex_];
const auto& streamMetadata =
containerMetadata_.allStreamMetadata[activeStreamIndex_];
yields:
/home/nicolashug/dev/torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp:1396:12: error: use of deleted function ‘std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = AVFrame; _Dp = facebook::torchcodec::Deleterp<AVFrame, void, av_frame_free>]’
1396 | return avFrame;
| ^~~~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there may be a way to do it, but it requires doing some std::move()
calls. I've had to do that on both sides: going in and out of functions. Let's commit this code as-is, and I can try playing with it later.
TORCH_CHECK( | ||
convertedAVFrame, | ||
"Could not allocate frame for sample format conversion."); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below: I realize we have previously avoided having any ffmpeg-related pre-proc directives in this file. I wonder however if it's really worth creating a separate util for this one line? I personally find the code easier to read this way, but no strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to consolidate all FFmpeg pre-proc directives in the FFmpeg utility. Most of them are just one line, but we already have 5 or so of them, and I anticipate having more. If we don't consolidate, I think we'd end up having the directives all over VideoDecoder.cpp
which I think harms readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will consolidate within FFmpegCommon.h
, but I think we could approach this with a more case-by-case basis. If code readability is the main factor, in that specific instance, I personally do not think that the resulting code is more readable:
setChannelLayout(convertedAVFrame, avFrame);
convertedAVFrame->format = static_cast<int>(desiredSampleFormat);
convertedAVFrame->sample_rate = avFrame->sample_rate;
convertedAVFrame->nb_samples = avFrame->nb_samples;
I.e. it's not obvious that setChannelLayout
just sets a single field and is part of the same logic as the 3 following lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not great, and I think the answer may be for us to implement them as methods on our unique wrappers. Something like:
convertedAVFrame->setChannelLayout(srcFrame);
But that will require us to define the wrapper logic ourselves, as opposed to just making them an alias to std::unique_ptr
.
Towards #549
This PR allows the audio decoder to support non-fltp formats. We still return fltp, we just automatically convert the frames' formats if needed.
The conversion is done with
libswresample
, which we'll also use to do resampling soon. We could alternatively usefiltergraph
. I think we should explorefiltegraph
eventually, and choose whichever library is the fastest (possibly with a backend switch, like what we currently do for videos).