-
Notifications
You must be signed in to change notification settings - Fork 31
Support all audio formats by converting to FLTP #556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 25 commits
823e9bb
846b7b8
979e72a
f6ecd32
58e5277
73af3e7
94c11fc
ee30cc3
9ee63e6
ccd0cb5
6b8e8be
999757a
bb92a9f
ba0ca4a
96592a7
640a82d
8378487
f6c70b6
de59431
da1796a
2ab9208
1c4b1f9
f235cc2
44500ad
db9ff94
3da223c
d5b4ec8
07287d5
a2b1988
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,7 @@ extern "C" { | |
#include <libavutil/imgutils.h> | ||
#include <libavutil/log.h> | ||
#include <libavutil/pixdesc.h> | ||
#include <libswresample/swresample.h> | ||
#include <libswscale/swscale.h> | ||
} | ||
|
||
|
@@ -558,6 +559,12 @@ void VideoDecoder::addAudioStream(int streamIndex) { | |
static_cast<int64_t>(streamInfo.codecContext->sample_rate); | ||
streamMetadata.numChannels = | ||
static_cast<int64_t>(getNumChannels(streamInfo.codecContext)); | ||
|
||
// FFmpeg docs say that the decoder will try to decode natively in this | ||
// format, if it can. Docs don't say what the decoder does when it doesn't | ||
// support that format, but it looks like it does nothing, so this probably | ||
// doesn't hurt. | ||
streamInfo.codecContext->request_sample_fmt = AV_SAMPLE_FMT_FLTP; | ||
} | ||
|
||
// -------------------------------------------------------------------------- | ||
|
@@ -1342,37 +1349,93 @@ void VideoDecoder::convertAudioAVFrameToFrameOutputOnCPU( | |
!preAllocatedOutputTensor.has_value(), | ||
"pre-allocated audio tensor not supported yet."); | ||
|
||
const AVFrame* avFrame = avFrameStream.avFrame.get(); | ||
AVSampleFormat sourceSampleFormat = | ||
static_cast<AVSampleFormat>(avFrameStream.avFrame->format); | ||
AVSampleFormat desiredSampleFormat = AV_SAMPLE_FMT_FLTP; | ||
|
||
UniqueAVFrame convertedAVFrame; | ||
if (sourceSampleFormat != desiredSampleFormat) { | ||
convertedAVFrame = convertAudioAVFrameSampleFormat( | ||
avFrameStream.avFrame, sourceSampleFormat, desiredSampleFormat); | ||
} | ||
const UniqueAVFrame& avFrame = (sourceSampleFormat != desiredSampleFormat) | ||
? convertedAVFrame | ||
: avFrameStream.avFrame; | ||
|
||
AVSampleFormat format = static_cast<AVSampleFormat>(avFrame->format); | ||
TORCH_CHECK( | ||
format == desiredSampleFormat, | ||
"Something went wrong, the frame didn't get converted to the desired format. ", | ||
"Desired format = ", | ||
av_get_sample_fmt_name(desiredSampleFormat), | ||
"source format = ", | ||
av_get_sample_fmt_name(format)); | ||
|
||
auto numSamples = avFrame->nb_samples; // per channel | ||
auto numChannels = getNumChannels(avFrame); | ||
torch::Tensor outputData = | ||
torch::empty({numChannels, numSamples}, torch::kFloat32); | ||
|
||
AVSampleFormat format = static_cast<AVSampleFormat>(avFrame->format); | ||
// TODO-AUDIO Implement all formats. | ||
switch (format) { | ||
case AV_SAMPLE_FMT_FLTP: { | ||
uint8_t* outputChannelData = static_cast<uint8_t*>(outputData.data_ptr()); | ||
auto numBytesPerChannel = numSamples * av_get_bytes_per_sample(format); | ||
for (auto channel = 0; channel < numChannels; | ||
++channel, outputChannelData += numBytesPerChannel) { | ||
memcpy( | ||
outputChannelData, | ||
avFrame->extended_data[channel], | ||
numBytesPerChannel); | ||
} | ||
break; | ||
} | ||
default: | ||
TORCH_CHECK( | ||
false, | ||
"Unsupported audio format (yet!): ", | ||
av_get_sample_fmt_name(format)); | ||
uint8_t* outputChannelData = static_cast<uint8_t*>(outputData.data_ptr()); | ||
auto numBytesPerChannel = numSamples * av_get_bytes_per_sample(format); | ||
for (auto channel = 0; channel < numChannels; | ||
++channel, outputChannelData += numBytesPerChannel) { | ||
memcpy( | ||
outputChannelData, avFrame->extended_data[channel], numBytesPerChannel); | ||
} | ||
frameOutput.data = outputData; | ||
} | ||
|
||
UniqueAVFrame VideoDecoder::convertAudioAVFrameSampleFormat( | ||
const UniqueAVFrame& avFrame, | ||
AVSampleFormat sourceSampleFormat, | ||
AVSampleFormat desiredSampleFormat | ||
|
||
) { | ||
auto& streamInfo = streamInfos_[activeStreamIndex_]; | ||
const auto& streamMetadata = | ||
containerMetadata_.allStreamMetadata[activeStreamIndex_]; | ||
int sampleRate = static_cast<int>(streamMetadata.sampleRate.value()); | ||
|
||
if (!streamInfo.swrContext) { | ||
createSwrContext( | ||
streamInfo, sampleRate, sourceSampleFormat, desiredSampleFormat); | ||
} | ||
|
||
UniqueAVFrame convertedAVFrame(av_frame_alloc()); | ||
TORCH_CHECK( | ||
convertedAVFrame, | ||
"Could not allocate frame for sample format conversion."); | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Below: I realize we have previously avoided having any ffmpeg-related pre-proc directives in this file. I wonder however if it's really worth creating a separate util for this one line? I personally find the code easier to read this way, but no strong opinion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to consolidate all FFmpeg pre-proc directives in the FFmpeg utility. Most of them are just one line, but we already have 5 or so of them, and I anticipate having more. If we don't consolidate, I think we'd end up having the directives all over There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will consolidate within
I.e. it's not obvious that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree it's not great, and I think the answer may be for us to implement them as methods on our unique wrappers. Something like:
But that will require us to define the wrapper logic ourselves, as opposed to just making them an alias to |
||
#if LIBAVFILTER_VERSION_MAJOR > 7 // FFmpeg > 4 | ||
convertedAVFrame->ch_layout = avFrame->ch_layout; | ||
#else | ||
convertedAVFrame->channel_layout = avFrame->channel_layout; | ||
#endif | ||
convertedAVFrame->format = static_cast<int>(desiredSampleFormat); | ||
convertedAVFrame->sample_rate = avFrame->sample_rate; | ||
convertedAVFrame->nb_samples = avFrame->nb_samples; | ||
|
||
auto status = av_frame_get_buffer(convertedAVFrame.get(), 0); | ||
TORCH_CHECK( | ||
status == AVSUCCESS, | ||
"Could not allocate frame buffers for sample format conversion: ", | ||
getFFMPEGErrorStringFromErrorCode(status)); | ||
|
||
auto numSampleConverted = swr_convert( | ||
streamInfo.swrContext.get(), | ||
convertedAVFrame->data, | ||
convertedAVFrame->nb_samples, | ||
(const uint8_t**)avFrame->data, | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
avFrame->nb_samples); | ||
TORCH_CHECK( | ||
numSampleConverted > 0, | ||
"Error in swr_convert: ", | ||
getFFMPEGErrorStringFromErrorCode(numSampleConverted)); | ||
|
||
return convertedAVFrame; | ||
} | ||
|
||
// -------------------------------------------------------------------------- | ||
// OUTPUT ALLOCATION AND SHAPE CONVERSION | ||
// -------------------------------------------------------------------------- | ||
|
@@ -1606,6 +1669,56 @@ void VideoDecoder::createSwsContext( | |
streamInfo.swsContext.reset(swsContext); | ||
} | ||
|
||
void VideoDecoder::createSwrContext( | ||
StreamInfo& streamInfo, | ||
int sampleRate, | ||
AVSampleFormat sourceSampleFormat, | ||
AVSampleFormat desiredSampleFormat) { | ||
SwrContext* swrContext = nullptr; | ||
|
||
int status = AVSUCCESS; | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#if LIBAVFILTER_VERSION_MAJOR > 7 // FFmpeg > 4 | ||
AVChannelLayout layout = streamInfo.codecContext->ch_layout; | ||
status = swr_alloc_set_opts2( | ||
&swrContext, | ||
&layout, | ||
desiredSampleFormat, | ||
sampleRate, | ||
&layout, | ||
sourceSampleFormat, | ||
sampleRate, | ||
0, | ||
nullptr); | ||
|
||
TORCH_CHECK( | ||
status == AVSUCCESS, | ||
"Couldn't create SwrContext: ", | ||
getFFMPEGErrorStringFromErrorCode(status)); | ||
#else | ||
int64_t layout = | ||
static_cast<int64_t>(streamInfo.codecContext->channel_layout); | ||
swrContext = swr_alloc_set_opts( | ||
nullptr, | ||
layout, | ||
desiredSampleFormat, | ||
sampleRate, | ||
layout, | ||
sourceSampleFormat, | ||
sampleRate, | ||
0, | ||
nullptr); | ||
#endif | ||
|
||
TORCH_CHECK(swrContext != nullptr, "Couldn't create swrContext"); | ||
|
||
status = swr_init(swrContext); | ||
TORCH_CHECK( | ||
status == AVSUCCESS, | ||
"Couldn't initialize SwrContext: ", | ||
getFFMPEGErrorStringFromErrorCode(status)); | ||
streamInfo.swrContext.reset(swrContext); | ||
} | ||
|
||
// -------------------------------------------------------------------------- | ||
// PTS <-> INDEX CONVERSIONS | ||
// -------------------------------------------------------------------------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above: I wasn't able to use the call to
convertAudioAVFrameSampleFormat
within the ternary expression.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative would be to define something like:
And then called with:
That's not necessarily better because there's more ceremony in declaring and defining a new function (although it does not need to be a class member function) and then it's not necessarily obvious what's going on with the sample format comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just realized that
convertAudioAVFrameSampleFormat()
already has that exact signature. I think we could simplify things a lot ifconvertAudioAVFrameSampleFormat()
becamemaybe ConvertAudioAVFrameSampleFormat()
(or a better name?) and only did the conversion if needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to make it work.
The function cannot take a
UniqueAVFrame avFrame
as input, it has to be a ref (or const ref).Then it has to return either
avFrame
(which is a ref) orconvertedAVFrame
which is aUniqueAVFrame
allocated within that function.I tried a few things, but nothing worked. It's possible I'm doing something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example the following:
yields:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there may be a way to do it, but it requires doing some
std::move()
calls. I've had to do that on both sides: going in and out of functions. Let's commit this code as-is, and I can try playing with it later.