[Whisper] Audio format errors on valid file #333

muddi900 · 2023-03-22T12:15:09Z

Describe the bug

Hello

I am trying to integrate the whisper API into my Flask app. However I get the following error when I input the received file from the flask endpoint, I get the following error:

openai.error.InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

However, loading the file in the interactive console works fine.

In [16]: r = openai.Audio.transcribe('whisper-1',open('../Downloads/sample.mp3','rb'))

In [17]: r
Out[17]:
<OpenAIObject at 0x192993c6750> JSON: {
  "text": "This episode is actually a co-production with another podcast called Digital Folklore, which is hosted by Mason Amadeus and Perry Carpenter. We've been doing a lot of our research together and our brainstorming sessions have been so thought-provoking, I wanted to bring them on so we could discuss the genre of analog horror together. So, why don't you guys introduce yourselves so we know who's who? Yeah, this is Perry Carpenter and I'm one of the hosts of Digital Folklore. And I'm Mason Amadeus and I'm the other host of Digital Folklore. And tell me, what is Digital Folklore? Yeah, so Digital Folklore is the evolution of folklore, you know, the way that we typically think about it. And folklore really is the product of basically anything that humans create that doesn't have a centralized canon. But when we talk about digital folklore, we're talking about..."
}

To Reproduce

Create a Flask App.
Add an end point that receives an valid audio file.
pass the bytes data of the file to openai.Audio.transcribe method through 'request.files[fileName].stream.read()`.

Code snippets

The end point code:


with tempfile.TemporaryFile() as temp_file:
    temp_file.write(audio_file)
    transcript_read = openai.Audio.transcribe("whisper-1", temp_file)
return transcript_read

the FFprobe info of the file:

ffprobe version 4.4.1-full_build-www.gyan.dev Copyright (c) 2007-2021 the FFmpeg developers
  built with gcc 11.2.0 (Rev1, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libglslang --enable-vulkan --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, mp3, from '.\Downloads\sample.mp3':
  Metadata:
    title           : Monsters in the Static
    comment         : We look at the subgenre of analog horror, where something sinister might be lurking in the horizontal lines and vertical holds of those old VHS tapes.
    lyrics-ENG      : <p>In the subgenre of analog horror, there’s something sinister or supernatural lurking in the horizontal lines and vertical holds in those old VHS tapes. Filmmaker <a href="https://wnuf.bigcartel.com/">Chris LaMartina</a> explains why he wanted his mov
    album           : Imaginary Worlds
    genre           : Podcast
    date            : 2020
    encoder         : Lavf58.76.100
  Duration: 00:00:50.05, start: 0.025057, bitrate: 128 kb/s
  Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.13

OS

Windows 11

Python version

Python v10.5

Library version

0.27.2

The text was updated successfully, but these errors were encountered:

sumeyyeyegen · 2023-03-24T11:27:54Z

were you able to find a solution? i am getting the same error. it works very well in jupyter notebook app. but I keep getting this error in the hugging face application.

muddi900 · 2023-03-24T16:16:58Z

I have found a workaround using replicate's implementation. It requires exposing a link to a file because replicate only works with hyperlinks. I am hoping the issue would be resolved by the time I am going live. If you are testing on local, you can use ngrok for the file link.

…

On Fri, Mar 24, 2023 at 4:28 PM Sumeyye Yegen ***@***.***> wrote: were you able to find a solution? i am getting the same error. it works very well in jupyter notebook app. but I keep getting this error in the hugging face application. — Reply to this email directly, view it on GitHub <#333 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI2VVZR4CCOS2UW4WJYL4WTW5WAMNANCNFSM6AAAAAAWDXNA24> . You are receiving this because you authored the thread.Message ID: ***@***.***>

IsaacKnowles · 2023-03-27T03:54:31Z

I've run into this today as well, but with the webm audio format, also using Flask. A web app is recording a brief spoken audio clip with mimeType 'audio/webm;codecs=opus' and when I save a copy of the recorded audio to a file, it is submitted to the API and correctly transcribed without issue. However, if I submit the request via my Flask app, I get the same error as @muddi900.

ffprobe info:

ffprobe version 5.1.2 Copyright (c) 2007-2022 the FFmpeg developers
  built with Apple clang version 14.0.0 (clang-1400.0.29.202)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/5.1.2_6 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
Input #0, matroska,webm, from 'sample.webm':
  Metadata:
    encoder         : Chrome
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)

OS: Mac 12.6.3
Python version: 3.10.1
openai version: 0.27.2

danielfaust · 2023-03-28T11:38:16Z

I was getting this error on some files, and while looking at them, I noticed that they are lacking a proper mp3 header. My solution was to let them run through ffmpeg once before uploading them, by using the acodec='copy' parameter so that the actual audio content of the mp3 file does not get modified:

ffmpeg \
    .input(path) \
    .output('_temp.mp3', acodec='copy') \
    .overwrite_output() \
    .run()

and then I'm uploading _temp.mp3 instead of path.

muddi900 · 2023-03-28T11:55:00Z

Well the file works fine for me when I use it as a local file. It is only when the file is uploaded server side that it is the issue. While my current workaround uses lical storage, in production it would be unfeasible. The Flask backend will probably be hosted on a ephemeral system.

…

On Tue, Mar 28, 2023, 6:38 AM Daniel Faust ***@***.***> wrote: I was getting this error on some files, and while looking at them, I noticed that they are lacking a proper mp3 header. My solution was to let them run through ffmpeg once before uploading them, by using the acodec='copy' parameter so that the actual audio content of the mp3 file does not get modified: ffmpeg \ .input(path) \ .output('_temp.mp3', acodec='copy') \ .overwrite_output() \ .run() and then I'm uploading _temp.mp3 instead of path. — Reply to this email directly, view it on GitHub <#333 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI2VVZRTGIEZ2WRRSJACPRLW6LETJANCNFSM6AAAAAAWDXNA24> . You are receiving this because you were mentioned.Message ID: ***@***.***>

sparkle666 · 2023-04-13T11:59:14Z

Also got same error when loading audio files locally.

audio2 = open("Greeti.mp3", "rb")                   
sub = openai.Audio.transcribe("whisper-1", audio2, response_format = "text")

'Greetings.\n'

Changing the response formats to a different string returns errors.

sub = openai.Audio.transcribe("whisper-1", audio2, response_format = "srt")

openai.error.InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

grp06 · 2023-04-14T01:00:35Z

I fought with this for a long time. Finally got it working by not using the MediaRecorder() API on the frontend. I switched to using

  const startRecording = () => {
    setIsRecording(true)
    navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
      const options = {
        type: 'audio',
        mimeType: 'audio/mp3',
        numberOfAudioChannels: 1,
        recorderType: RecordRTC.StereoAudioRecorder,
        checkForInactiveTracks: true,
        timeSlice: 5000,
        ondataavailable: (blob) => {
          socket.emit('audio', { buffer: blob })
        },
      }

      const recordRTC = new RecordRTC(stream, options)
      setRecorder(recordRTC)
      recordRTC.startRecording()
    })
  }

and it worked immediately.

zieen · 2023-04-21T08:18:48Z

how to dell it in server side ? it complain about the missing name of file?

amoschoomy · 2023-05-01T12:08:45Z

any updates on this? the request fails inside an endpoint, but works when on local files

calummackervoy · 2023-05-10T14:49:18Z

I have a variation on the solution using RecordRTC which was posted above. It shows how to use start/stop, reset the audio channel, send with Ajax request (multi-part form data)

if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
        navigator.mediaDevices.getUserMedia({ audio: true })
        .then((stream) => {
            const options = {
                type: 'audio',
                mimeType: 'audio/mp3',
                numberOfAudioChannels: 1,
                recorderType: RecordRTC.StereoAudioRecorder
            }
 
            const recordRTC = new RecordRTC(stream, options);

            $(document).on("click", "#record-button", () => {
                let recordButton = $("#record-button");
                // already recording, hit stop
                if(recordButton.attr("recording") === "true") {
                    recordButton.attr("recording", false)
                    recordButton.html("REC");

                    recordRTC.stopRecording(async () => {
                        let blob = await recordRTC.getBlob();
                        var form = new FormData();
                        form.append("file", blob);
                        $.ajax({
                            type: "POST",
                            data: form,
                            url: "",
                            processData: false,
                            contentType: false,
                            success: function (data) {
                                // ...
                                recordRTC.reset();
                            },
                            error: (err) => {
                                // ...
                                recordRTC.reset();
                            }
                        });
                    });
                }
                // not recording, hit play
                else {
                    //mediaRecorder.start();
                    recordButton.attr("recording", true);
                    recordButton.html("STOP");

                    recordRTC.startRecording();
                }
            });
        })
        // Error callback
        .catch((err) => {
            console.error(`The following getUserMedia error occurred: ${err}`);
        });
}

odusseys · 2023-08-03T09:21:34Z

Having the same issue with a mp3 file, written by ffmpeg, lame mp3

cquintero4told · 2023-08-04T22:54:23Z

Try with this code:

with tempfile.NamedTemporaryFile(suffix='.mp3') as temp_file:
    temp_file.write(audio_file)
    temp_file.flush()
    temp_file.seek(0)

    transcript_read = openai.Audio.transcribe("whisper-1", temp_file)

ilyak1990 · 2023-09-02T02:57:05Z

For me specifically it was on iPhone, I was saving a valid .wav file (was working when I tested it) then I used a file type detector tool to find out it was actually some other file format that apple was saving it to, you can either convert to and from file types using node library ffmpeg or for iphone specifically save it as a .m4a file instead of .wav

andora2 · 2023-09-11T16:28:12Z

was fighting with similar problems.
Works currently on iPhone 13(iOS 17) with MediaRecorder,

Client:

...
const recorder = new MediaRecorder(stream, { mimeType: 'audio/mp4' });
...

Server:

...
const formData = new FormData();
formData.append('file', buffer, { filename: "audio.mp4", contentType:  "audio/mp4" });
...
const response = await axios.post('https://api.openai.com/v1/audio/transcriptions', formData, {
    headers: {
        'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
        ...formData.getHeaders(),
    },
});

For Chrome & co. I go with 'audio/webm' on client and 'audio/mp3' on server.

Don't really expect this to be bulletproof, but for the time being its stable enough for my needs.

BR,
Adrian

thiswillbeyourgithub · 2023-11-08T14:08:54Z

Hi, I think I found a viable workaround. The issue seems to be in the way the BufferReader is reading files. Bypassing the bufferreader fixed it for me.

Can't investigate more because of the outtage unfortunately. Might be related to #727

In this line if instead of file=buffer_reader I put file=open(args.file, "rb") the function returns normally instead of 400 error.

edit: found a fix and submitted a PR in #733

rattrayalex · 2023-11-10T03:15:32Z

OpenAI audio endpoints generally require a filename with extension be provided in the upload, which is used to determine the file type.

This is made more convenient with the new v1 of the SDK, where you can pass a pathlib.Path to the API:

from pathlib import Path
from openai import OpenAI

openai = OpenAI()

speech_file_path = Path(__file__).parent / "Downloads" / "sample.mp3"


openai.audio.transcriptions.create(model='whisper-1', file=file)

muddi900 added the bug Something isn't working label Mar 22, 2023

hallacy assigned mpokrass Apr 6, 2023

asomervell mentioned this issue Apr 24, 2023

OpenAI's servers don't seem to like audio files from iOS IgnoranceAI/hugh#4

Open

thiswillbeyourgithub mentioned this issue Nov 8, 2023

Uploading JSON to Files API returns invalid file format #727

Closed

thiswillbeyourgithub mentioned this issue Nov 8, 2023

fix(cli/audio): file format detection failing for whisper #733

Merged

rattrayalex closed this as completed Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Whisper] Audio format errors on valid file #333

[Whisper] Audio format errors on valid file #333

muddi900 commented Mar 22, 2023 •

edited

Loading

sumeyyeyegen commented Mar 24, 2023

muddi900 commented Mar 24, 2023 via email

IsaacKnowles commented Mar 27, 2023

danielfaust commented Mar 28, 2023

muddi900 commented Mar 28, 2023 via email

sparkle666 commented Apr 13, 2023

grp06 commented Apr 14, 2023

zieen commented Apr 21, 2023

amoschoomy commented May 1, 2023

calummackervoy commented May 10, 2023 •

edited

Loading

odusseys commented Aug 3, 2023

cquintero4told commented Aug 4, 2023

ilyak1990 commented Sep 2, 2023

andora2 commented Sep 11, 2023

thiswillbeyourgithub commented Nov 8, 2023 •

edited

Loading

rattrayalex commented Nov 10, 2023

[Whisper] Audio format errors on valid file #333

[Whisper] Audio format errors on valid file #333

Comments

muddi900 commented Mar 22, 2023 • edited Loading

Describe the bug

To Reproduce

Code snippets

OS

Python version

Library version

sumeyyeyegen commented Mar 24, 2023

muddi900 commented Mar 24, 2023 via email

IsaacKnowles commented Mar 27, 2023

danielfaust commented Mar 28, 2023

muddi900 commented Mar 28, 2023 via email

sparkle666 commented Apr 13, 2023

grp06 commented Apr 14, 2023

zieen commented Apr 21, 2023

amoschoomy commented May 1, 2023

calummackervoy commented May 10, 2023 • edited Loading

odusseys commented Aug 3, 2023

cquintero4told commented Aug 4, 2023

ilyak1990 commented Sep 2, 2023

andora2 commented Sep 11, 2023

thiswillbeyourgithub commented Nov 8, 2023 • edited Loading

rattrayalex commented Nov 10, 2023

muddi900 commented Mar 22, 2023 •

edited

Loading

calummackervoy commented May 10, 2023 •

edited

Loading

thiswillbeyourgithub commented Nov 8, 2023 •

edited

Loading