[Audio.transcribe] JsonDecodeError when printing vtt from m4a #243

sheikheddy · 2023-03-01T21:56:51Z

Describe the bug

This section of the codebase expects json even when the response_format is not json:

https://github.com/openai/openai-python/blob/75c90a71e88e4194ce22c71edeb3d2dee7f6ac93/openai/api_requestor.py#L668C7-L673

I think I can contribute a quick bug fix PR today!

To Reproduce

Open an m4a file in a jupyter notebook (python 3.10.10)
Transcribe with whisper-1
Print transcript

Stack:
JSONDecodeError Traceback (most recent call last)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\openai\api_requestor.py:669, in APIRequestor._interpret_response_line(self, rbody, rcode, rheaders, stream)
668 try:
--> 669 data = json.loads(rbody)
670 except (JSONDecodeError, UnicodeDecodeError) as e:

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\json\__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\json\decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\json\decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None

Code snippets

f = open("testing.m4a", "rb")
transcript = openai.Audio.transcribe("whisper-1", f,response_format="vtt")
print(transcript)


https://github.com/openai/openai-python/blob/75c90a71e88e4194ce22c71edeb3d2dee7f6ac93/openai/api_requestor.py#L668C7-L673`

OS

Windows 11

Python version

Python v3.10.10

Library version

openai-python 0.27.0

The text was updated successfully, but these errors were encountered:

sheikheddy · 2023-03-01T22:02:28Z

Also happens with srt

JSONDecodeError                           Traceback (most recent call last)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\openai\api_requestor.py:669, in APIRequestor._interpret_response_line(self, rbody, rcode, rheaders, stream)
    668 try:
--> 669     data = json.loads(rbody)
    670 except (JSONDecodeError, UnicodeDecodeError) as e:

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\json\__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\json\decoder.py:340, in JSONDecoder.decode(self, s, _w)
    339 if end != len(s):
--> 340     raise JSONDecodeError("Extra data", s, end)
    341 return obj

JSONDecodeError: Extra data: line 2 column 1 (char 2)

The above exception was the direct cause of the following exception:

APIError                                  Traceback (most recent call last)

sheikheddy · 2023-03-01T22:03:52Z

And for text:

JSONDecodeError                           Traceback (most recent call last)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\openai\api_requestor.py:669, in APIRequestor._interpret_response_line(self, rbody, rcode, rheaders, stream)
    668 try:
--> 669     data = json.loads(rbody)
    670 except (JSONDecodeError, UnicodeDecodeError) as e:

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\json\__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\json\decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\json\decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
...
    675 # In the future, we might add a "status" parameter to errors
    676 # to better handle the "error while streaming" case.

sheikheddy · 2023-03-01T22:07:04Z

Seems similar to #216

mpokrass · 2023-03-01T23:55:53Z

thanks for reporting @sheikheddy - we seem to have missed this. please tag me if you do end up putting a pr up!

sheikheddy · 2023-03-02T05:17:33Z

Background

The Whisper repo uses ResultWriter in utils.py. That's missing here.

This is the relevant snippet before going into __interpret_response_line

openai-python/openai/api_requestor.py

Lines 216 to 227 in 1165639

    
           result = self.request_raw( 
        
               method.lower(), 
        
               url, 
        
               params=params, 
        
               supplied_headers=headers, 
        
               files=files, 
        
               stream=stream, 
        
               request_id=request_id, 
        
               request_timeout=request_timeout, 
        
           ) 
        
           resp, got_stream = self._interpret_response(result, stream) 
        
           return resp, got_stream, self.api_key

Whisper uses a get_writer function, then writes the result to an output path. https://github.com/openai/whisper/blob/da600abd2b296a5450770b872c3765d0a5a5c769/whisper/transcribe.py#L313-L317. But in this repo, the result gets passed to _interpret_response and an OpenAIResponse.
A long time ago, this repo was forked from the Stripe python library, and if you look at their API_Requestor, it's a lot cleaner: https://github.com/stripe/stripe-python/blob/master/stripe/api_requestor.py#L385.

Discussion

Fundamentally, the problem here is that we're violating the assumption that the request always returns a JSON encoded response.

My proposal is to prepare requests to Whisper such that they always return serialized JSON, and if the user wants a vtt or srt file, we do the conversion on the client inside audio.py.

Concretely, a call to transcribe would use transcribe_raw in the beginning to get the raw JSON, and then in the middle process that according to response_format, and in the end return the content in the format that the caller requested.

Evaluation

Let's consider our options and tradeoffs in terms of the user experience, product roadmap, and technical architecture.

In some cases, it may be more beneficial to perform formatting on the client-side, especially when there are multiple clients consuming the same API resource, but each client requires a different data format. For instance, if Alice and Bob request a transcription of an identical audio file, Alice may require the transcription in vtt format, while Bob may need it in srt format.

It is more efficient to store a single compact idempotent representation instead of multiple representations. When sending text over the wire, using a binary format that is deserialized into JSON on the client can be more efficient compared to the wasteful I/O overhead of sending the text directly.

In the future, as we support passing URLs to remote files, we may see a pattern of a few popular, large-size tracks being transcribed frequently, while the majority of requests will be for smaller one-off transcriptions.

It's worth noting that we can cache the result of the first request and serve it from the cache on the second request. Cache management is more complex in practice, but let's keep it simple for now.

Biggest drawback I can see is that converting large files might be slow, especially on older hardware, and we should be careful to control the footprint of client workloads.

Alternatives

One potential fix may be to check and enforce the Content-Type HTTP header. Another might be to band-aid over this with a hack that peeks into rbody to check the type. I don't think either of those would be good design decisions in this case, but I'm open to other perspectives, and it's always possible I'm missing something.

If nobody suggests something more clever/elegant/scalable, I'll just implement the best I can come up with right now and get that reviewed.

guaguaguaxia · 2023-03-06T06:14:38Z

Anybody can fix this problem quickly,i do need it 😢

guaguaguaxia · 2023-03-06T08:13:48Z

I try to fix it and create a PR

guaguaguaxia · 2023-03-06T10:03:48Z

this is my PR:#277

lordbritish888 · 2023-03-07T00:51:06Z

this is my PR:#277

Still fails in precisely the same way :/

guaguaguaxia · 2023-03-07T01:10:32Z

this is my PR:#277

Still fails in precisely the same way :/

OK i'll check it ASAP

guaguaguaxia · 2023-03-07T01:24:31Z

this is my PR:#277

Still fails in precisely the same way :/

could you tell me how do you test it?i don't know how to test it locally

lordbritish888 · 2023-03-07T02:05:45Z

I am on Windows with Python 3.10. Can I find you in a discord or telegram channel?

guaguaguaxia · 2023-03-07T02:18:49Z

I am on Windows with Python 3.10. Can I find you in a discord or telegram channel?

+86 13048282493 telegram

MattFisher · 2023-03-07T02:19:55Z

I checked and the Content-Type header on the response is always set correctly (at least for the audio transcription responses), so we can deduce whether to json-decode within APIRequestor._interpret_response without any other code changes.

For the json and verbose_json response formats, the Content-Type is 'application/json', while for text, srt, and vtt, the Content-Type is 'text/plain; charset=utf-8'.

if rheaders.get('Content-Type') == 'application/json':
    data = json.loads(rbody)
else:
    data = rbody

Alternatively if we want to be more conservative in case responses from other parts of the API might not have the Content-Type header set correctly to application/json for json responses, we could go the other way:

if 'text/plain' in rheaders.get('Content-Type'):
    data = rbody
else:
    data = json.loads(rbody)

lordbritish888 · 2023-03-07T02:39:58Z

what is your handle on telegram? the number is not working

guaguaguaxia · 2023-03-07T02:40:14Z

@MattFisher please review it in #282, i don't know how to close it in same PR 😭

lordbritish888 · 2023-03-07T02:42:05Z

if you need assistance reproducing the error or details, I made this TG channel for us.. https://t.me/guagua_mfisher

guaguaguaxia · 2023-03-07T02:43:30Z

if you need assistance reproducing the error or details, I made this TG channel for us.. https://t.me/guagua_mfisher

I'm in

guaguaguaxia · 2023-03-07T02:46:55Z

@lordbritish888 i can'y say anything and can't see your profile

lordbritish888 · 2023-03-07T02:48:24Z

@lordbritish888 i can'y say anything and can't see your profile

Just made you an admin @guaguaguaxia

sheikheddy added the bug Something isn't working label Mar 1, 2023

guaguaguaxia mentioned this issue Mar 6, 2023

fix:fix response format bug #277

Closed

MattFisher mentioned this issue Mar 7, 2023

fix:fix response format bug #282

Merged

mpokrass closed this as completed Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Audio.transcribe] JsonDecodeError when printing vtt from m4a #243

[Audio.transcribe] JsonDecodeError when printing vtt from m4a #243

sheikheddy commented Mar 1, 2023 •

edited

Loading

sheikheddy commented Mar 1, 2023

sheikheddy commented Mar 1, 2023

sheikheddy commented Mar 1, 2023

mpokrass commented Mar 1, 2023

sheikheddy commented Mar 2, 2023 •

edited

Loading

guaguaguaxia commented Mar 6, 2023

guaguaguaxia commented Mar 6, 2023

guaguaguaxia commented Mar 6, 2023

lordbritish888 commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

lordbritish888 commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

MattFisher commented Mar 7, 2023 •

edited

Loading

lordbritish888 commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

lordbritish888 commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

lordbritish888 commented Mar 7, 2023

[Audio.transcribe] JsonDecodeError when printing vtt from m4a #243

[Audio.transcribe] JsonDecodeError when printing vtt from m4a #243

Comments

sheikheddy commented Mar 1, 2023 • edited Loading

Describe the bug

To Reproduce

Code snippets

OS

Python version

Library version

sheikheddy commented Mar 1, 2023

sheikheddy commented Mar 1, 2023

sheikheddy commented Mar 1, 2023

mpokrass commented Mar 1, 2023

sheikheddy commented Mar 2, 2023 • edited Loading

Background

Discussion

Evaluation

Alternatives

guaguaguaxia commented Mar 6, 2023

guaguaguaxia commented Mar 6, 2023

guaguaguaxia commented Mar 6, 2023

lordbritish888 commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

lordbritish888 commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

MattFisher commented Mar 7, 2023 • edited Loading

lordbritish888 commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

lordbritish888 commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

guaguaguaxia commented Mar 7, 2023

lordbritish888 commented Mar 7, 2023

sheikheddy commented Mar 1, 2023 •

edited

Loading

sheikheddy commented Mar 2, 2023 •

edited

Loading

MattFisher commented Mar 7, 2023 •

edited

Loading