[Bugfix][Frontend] support webm with audioread fallback #18477

cpwan · 2025-05-21T10:27:16Z

github-actions · 2025-05-21T10:33:04Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

cpwan · 2025-05-21T10:40:25Z

Btw, i tried to make minimal change to the code, yet the code in the main branch falls short of the pre-commit check, such as the import sorting...

vllm/entrypoints/openai/serving_transcription.py

DarkLight1337 · 2025-05-21T12:36:27Z

@NickLucche does this look good to you?

cpwan · 2025-05-21T12:39:31Z

Let me tidy up a bit the git history

Signed-off-by: cpwan <[email protected]>

NickLucche

Thanks for taking action on the issue.
To be frank I am not familiar with audioread, but I see librosa already uses it as a fallback so maybe this doesn't even go into the requirements lists.

librosa uses soundfile and audioread for reading audio. As of v0.7, librosa uses soundfile by default, and falls back on audioread only when dealing with codecs unsupported by soundfile. For a list of codecs supported by soundfile, see the libsndfile documentation.

https://github.com/librosa/librosa/blob/e403272fc984bc4aeb316e5f15899042224bb9fe/docs/ioformats.rst#read-specific-formats

Also I am totally uneducated on potential security concerns in opening up to a matrioska format like webm (cc @russellb ).
Can we at least test for the sake of completeness sending some video/image in webm?

NickLucche · 2025-05-21T12:38:46Z

vllm/entrypoints/openai/serving_transcription.py

+            try:
+                with io.BytesIO(audio_data) as bytes_:
+                    out = librosa.load(bytes_, sr=None)
+            except Exception:


this exception is way too generic

NickLucche · 2025-05-21T12:39:44Z

vllm/entrypoints/openai/serving_transcription.py

+                with io.BytesIO(audio_data) as bytes_:
+                    out = librosa.load(bytes_, sr=None)
+            except Exception:
+                with tempfile.NamedTemporaryFile() as temp:


we should write to a bytesio in memory buffer not a temp file. This may, among other things, trigger permissions issues on deployments.

NickLucche · 2025-05-21T12:40:34Z

vllm/entrypoints/openai/serving_transcription.py

+                    out = librosa.load(bytes_, sr=None)
+            except Exception:
+                with tempfile.NamedTemporaryFile() as temp:
+                    temp.write(audio_data)


we should probably log debug/warning this path

NickLucche · 2025-05-21T12:50:13Z

Also worth to look into why librosa isn't falling back to audioread as reported in the docs, is it an optional dep or..?

mergify bot added the frontend label May 21, 2025

cpwan force-pushed the add-webm branch 2 times, most recently from c667ae0 to 8529fe5 Compare May 21, 2025 10:31

cpwan mentioned this pull request May 21, 2025

[Bug]: Audio transcription does not support webm #18385

Open

1 task

DarkLight1337 reviewed May 21, 2025

View reviewed changes

vllm/entrypoints/openai/serving_transcription.py Show resolved Hide resolved

mergify bot added the ci/build label May 21, 2025

fix: support webm with audioread fallback

691656d

Signed-off-by: cpwan <[email protected]>

cpwan force-pushed the add-webm branch from e814b6f to 691656d Compare May 21, 2025 12:44

NickLucche suggested changes May 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix][Frontend] support webm with audioread fallback #18477

[Bugfix][Frontend] support webm with audioread fallback #18477

cpwan commented May 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

cpwan commented May 21, 2025

Uh oh!

Uh oh!

DarkLight1337 commented May 21, 2025

Uh oh!

cpwan commented May 21, 2025

Uh oh!

NickLucche left a comment •

edited

Loading

Uh oh!

NickLucche May 21, 2025

Uh oh!

NickLucche May 21, 2025

Uh oh!

NickLucche May 21, 2025

Uh oh!

NickLucche commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

[Bugfix][Frontend] support webm with audioread fallback #18477

Are you sure you want to change the base?

[Bugfix][Frontend] support webm with audioread fallback #18477

Conversation

cpwan commented May 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

cpwan commented May 21, 2025

Uh oh!

Uh oh!

DarkLight1337 commented May 21, 2025

Uh oh!

cpwan commented May 21, 2025

Uh oh!

NickLucche left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickLucche May 21, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche May 21, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche May 21, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche commented May 21, 2025

Uh oh!

Uh oh!

cpwan commented May 21, 2025 •

edited by github-actions bot

Loading

NickLucche left a comment •

edited

Loading