You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add dedicated transcription interface for audio-to-text models
Current Behavior
The README currently shows audio transcription support through the chat interface:
# Analyze audio recordingschat.ask'Describe this meeting',with: {audio: 'meeting.wav'}
However, this doesn't work. The library includes specific transcription models (gpt-4o-transcribe, gpt-4o-mini-transcribe) but attempting to use them results in errors. These models are distinct from audio conversation models (gpt-4o-audio-preview) and text-to-speech models (gpt-4o-mini-tts).
Using chat interface fails because transcription models aren't chat models:
chat=RubyLLM.chat(model: 'gpt-4o-transcribe')chat.ask('Transcribe this',with: {audio: 'audio.mp3'})# Error: This is not a chat model and thus not supported in the v1/chat/completions endpoint
No dedicated transcription method exists:
RubyLLM.transcribe('audio.mp3',model: 'gpt-4o-transcribe')# Error: undefined method 'transcribe' for module RubyLLM
Desired Behavior
Add a dedicated transcription interface consistent with other RubyLLM operations:
# Simple usagetranscription=RubyLLM.transcribe('audio.mp3',model: 'gpt-4o-transcribe')putstranscription.text# With optionstranscription=RubyLLM.transcribe('audio.mp3',model: 'gpt-4o-transcribe',language: 'en',# Optional language hintprompt: 'This is a technical discussion'# Optional context)
This would:
Provide a consistent interface for audio transcription
Support different transcription models
Match the pattern of other RubyLLM operations (chat, paint, embed)
Allow for future expansion to other providers' transcription models
Current Workaround
Until this feature is implemented, users need to use the OpenAI client directly for transcription:
The README needs to be updated to remove the misleading example of audio support through the chat interface. Instead, it should document the new dedicated transcription interface, making it clear that audio processing is a separate operation from chat, similar to how image generation (paint) and embeddings are handled.
The text was updated successfully, but these errors were encountered:
The example in the README isn't misleading – we absolutely do support audio in chat today. Our test suite has working examples with gpt-4o-audio-preview models.
What we don't have is a dedicated transcription-only interface for models like gpt-4o-transcribe, which is a fair point.
Go ahead and open a PR! This would be a nice addition to the API that fits our pattern of simple, top-level methods.
@crmne Sorry, you're absolutely right, and it's right there in the chat guide. I have edited the reported issue to remove any reference to the lack of audio support. I'll take a look at putting together a PR.
Add dedicated transcription interface for audio-to-text models
Current Behavior
The README currently shows audio transcription support through the chat interface:
However, this doesn't work. The library includes specific transcription models (
gpt-4o-transcribe
,gpt-4o-mini-transcribe
) but attempting to use them results in errors. These models are distinct from audio conversation models (gpt-4o-audio-preview
) and text-to-speech models (gpt-4o-mini-tts
).Using chat interface fails because transcription models aren't chat models:
No dedicated transcription method exists:
Desired Behavior
Add a dedicated transcription interface consistent with other RubyLLM operations:
This would:
Current Workaround
Until this feature is implemented, users need to use the OpenAI client directly for transcription:
Documentation
The README needs to be updated to remove the misleading example of audio support through the chat interface. Instead, it should document the new dedicated transcription interface, making it clear that audio processing is a separate operation from chat, similar to how image generation (
paint
) and embeddings are handled.The text was updated successfully, but these errors were encountered: