Support all audio formats by converting to FLTP #556

NicolasHug · 2025-03-13T16:57:59Z

Towards #549

This PR allows the audio decoder to support non-fltp formats. We still return fltp, we just automatically convert the frames' formats if needed.

The conversion is done with libswresample, which we'll also use to do resampling soon. We could alternatively use filtergraph. I think we should explore filtegraph eventually, and choose whichever library is the fastest (possibly with a backend switch, like what we currently do for videos).

This reverts commit ccd0cb5.

This reverts commit bb92a9f.

NicolasHug · 2025-03-14T16:13:15Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+  }
+  const UniqueAVFrame& avFrame = (sourceSampleFormat != desiredSampleFormat)
+      ? convertedAVFrame
+      : avFrameStream.avFrame;


Above: I wasn't able to use the call to convertAudioAVFrameSampleFormat within the ternary expression.

The alternative would be to define something like:

UniqueAVFrame maybeConvertAVFrame( UniqueAVFrame frame, AVSampleFormat source, AVSampleFormat desired) { if (source != desired) { return convertAudioAVFrameSampleFormat(frame, source, desired); else { return frame; } }

And then called with:

UniqueAVFrame avFrame = maybeConvertAVFrame( avFrameStream.avFrame, sourceSampleFormat, desiredSampleFormat);

That's not necessarily better because there's more ceremony in declaring and defining a new function (although it does not need to be a class member function) and then it's not necessarily obvious what's going on with the sample format comparison.

Actually, I just realized that convertAudioAVFrameSampleFormat() already has that exact signature. I think we could simplify things a lot if convertAudioAVFrameSampleFormat() became maybe ConvertAudioAVFrameSampleFormat() (or a better name?) and only did the conversion if needed.

I wasn't able to make it work.

The function cannot take a UniqueAVFrame avFrame as input, it has to be a ref (or const ref).

Then it has to return either avFrame (which is a ref) or convertedAVFrame which is a UniqueAVFrame allocated within that function.

I tried a few things, but nothing worked. It's possible I'm doing something wrong.

For example the following:

diff --git a/src/torchcodec/decoders/_core/VideoDecoder.cpp b/src/torchcodec/decoders/_core/VideoDecoder.cpp index 9871db6..3003792 100644 --- a/src/torchcodec/decoders/_core/VideoDecoder.cpp +++ b/src/torchcodec/decoders/_core/VideoDecoder.cpp @@ -1361,14 +1361,8 @@ void VideoDecoder::convertAudioAVFrameToFrameOutputOnCPU( static_cast<AVSampleFormat>(avFrameStream.avFrame->format); AVSampleFormat desiredSampleFormat = AV_SAMPLE_FMT_FLTP; - UniqueAVFrame convertedAVFrame; - if (sourceSampleFormat != desiredSampleFormat) { - convertedAVFrame = convertAudioAVFrameSampleFormat( - avFrameStream.avFrame, sourceSampleFormat, desiredSampleFormat); - } - const UniqueAVFrame& avFrame = (sourceSampleFormat != desiredSampleFormat) - ? convertedAVFrame - : avFrameStream.avFrame; + const UniqueAVFrame& avFrame = convertAudioAVFrameSampleFormat( + avFrameStream.avFrame, sourceSampleFormat, desiredSampleFormat); AVSampleFormat format = static_cast<AVSampleFormat>(avFrame->format); TORCH_CHECK( @@ -1397,9 +1391,11 @@ void VideoDecoder::convertAudioAVFrameToFrameOutputOnCPU( UniqueAVFrame VideoDecoder::convertAudioAVFrameSampleFormat( const UniqueAVFrame& avFrame, AVSampleFormat sourceSampleFormat, - AVSampleFormat desiredSampleFormat + AVSampleFormat desiredSampleFormat) { + if (sourceSampleFormat == desiredSampleFormat) { + return avFrame; + } -) { auto& streamInfo = streamInfos_[activeStreamIndex_]; const auto& streamMetadata = containerMetadata_.allStreamMetadata[activeStreamIndex_];

yields:

/home/nicolashug/dev/torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp:1396:12: error: use of deleted function ‘std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = AVFrame; _Dp = facebook::torchcodec::Deleterp<AVFrame, void, av_frame_free>]’ 1396 | return avFrame; | ^~~~~~~

I think there may be a way to do it, but it requires doing some std::move() calls. I've had to do that on both sides: going in and out of functions. Let's commit this code as-is, and I can try playing with it later.

NicolasHug · 2025-03-14T16:14:42Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+  TORCH_CHECK(
+      convertedAVFrame,
+      "Could not allocate frame for sample format conversion.");
+


Below: I realize we have previously avoided having any ffmpeg-related pre-proc directives in this file. I wonder however if it's really worth creating a separate util for this one line? I personally find the code easier to read this way, but no strong opinion.

I'd like to consolidate all FFmpeg pre-proc directives in the FFmpeg utility. Most of them are just one line, but we already have 5 or so of them, and I anticipate having more. If we don't consolidate, I think we'd end up having the directives all over VideoDecoder.cpp which I think harms readability.

I will consolidate within FFmpegCommon.h, but I think we could approach this with a more case-by-case basis. If code readability is the main factor, in that specific instance, I personally do not think that the resulting code is more readable:

setChannelLayout(convertedAVFrame, avFrame); convertedAVFrame->format = static_cast<int>(desiredSampleFormat); convertedAVFrame->sample_rate = avFrame->sample_rate; convertedAVFrame->nb_samples = avFrame->nb_samples;

I.e. it's not obvious that setChannelLayout just sets a single field and is part of the same logic as the 3 following lines.

I agree it's not great, and I think the answer may be for us to implement them as methods on our unique wrappers. Something like:

convertedAVFrame->setChannelLayout(srcFrame);

But that will require us to define the wrapper logic ourselves, as opposed to just making them an alias to std::unique_ptr.

src/torchcodec/decoders/_core/VideoDecoder.cpp

test/utils.py

src/torchcodec/decoders/_core/VideoDecoder.cpp

NicolasHug added 4 commits March 13, 2025 14:43

WIP

823e9bb

Merge branch 'main' of github.com:pytorch/torchcodec into fltp

846b7b8

WIP

979e72a

WIP

f6ecd32

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 13, 2025

NicolasHug added 3 commits March 13, 2025 17:23

Add sample_format to audio metadata

58e5277

Merge branch 'sample_format_metadata' into fltp

73af3e7

Add non-fltp file

94c11fc

NicolasHug mentioned this pull request Mar 13, 2025

Add sample_format to audio metadata #557

Merged

NicolasHug added 9 commits March 13, 2025 18:22

Merge branch 'sample_format_metadata' into fltp

ee30cc3

more stuff

9ee63e6

Debug, ffmpeg7 only

ccd0cb5

Use new FFmpeg builds

6b8e8be

Fix

999757a

Revert "Debug, ffmpeg7 only"

bb92a9f

This reverts commit ccd0cb5.

Reapply "Debug, ffmpeg7 only"

ba0ca4a

This reverts commit bb92a9f.

Use ffmpeg7 for macos too

96592a7

Put back everything except ffmpeg4

640a82d

NicolasHug mentioned this pull request Mar 14, 2025

Add libswresample to FFmpeg source build #560

Merged

NicolasHug added 9 commits March 14, 2025 14:51

Don't use raw pointer

8378487

WIP

f6c70b6

Merge branch 'main' of github.com:pytorch/torchcodec into fltp

de59431

Put back FFmpeg4

da1796a

lint

2ab9208

AAAAH

1c4b1f9

Revert some stuff

f235cc2

Put back flags

44500ad

Add one more check

db9ff94

NicolasHug commented Mar 14, 2025

View reviewed changes

src/torchcodec/decoders/_core/VideoDecoder.cpp Outdated Show resolved Hide resolved

NicolasHug commented Mar 14, 2025

View reviewed changes

test/utils.py Show resolved Hide resolved

NicolasHug marked this pull request as ready for review March 14, 2025 16:29

Merge branch 'main' of github.com:pytorch/torchcodec into fltp

3da223c

scotts reviewed Mar 17, 2025

View reviewed changes

src/torchcodec/decoders/_core/VideoDecoder.cpp Outdated Show resolved Hide resolved

NicolasHug added 3 commits March 17, 2025 14:21

Move stuff into ffmpeg.cpp

d5b4ec8

const cast

07287d5

Fix

a2b1988

scotts approved these changes Mar 17, 2025

View reviewed changes

NicolasHug merged commit 5713507 into pytorch:main Mar 17, 2025
46 checks passed

This was referenced Mar 17, 2025

Simplify UniqueAVFrame handling in audio decoder #571

Draft

Audio decoding TODOs #549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support all audio formats by converting to FLTP #556

Support all audio formats by converting to FLTP #556

NicolasHug commented Mar 13, 2025 •

edited

Loading

NicolasHug Mar 14, 2025

scotts Mar 17, 2025

scotts Mar 17, 2025

NicolasHug Mar 17, 2025

NicolasHug Mar 17, 2025

scotts Mar 17, 2025

NicolasHug Mar 14, 2025

scotts Mar 17, 2025

NicolasHug Mar 17, 2025 •

edited

Loading

scotts Mar 17, 2025

Support all audio formats by converting to FLTP #556

Support all audio formats by converting to FLTP #556

Conversation

NicolasHug commented Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented Mar 13, 2025 •

edited

Loading

NicolasHug Mar 17, 2025 •

edited

Loading