-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Populate the audio stream with items added to the conversation #2262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @daltskin, As you currently identified right now, when the assistant processes a function call, it adds the result to the conversation history as text, but it doesn't automatically speak that result out loud. So, instead of hearing the output, users only see it, and the only workaround is to have the assistant repeat it, which isn’t very smooth. My suggestions- Now we could modify the API so that when you add a function's output, there's an option (like a stream_audio=True flag) to also send that output to the audio stream automatically. Alternatively, a new method (say, synthesize_and_stream) could be introduced to handle this. We could use External Tools (like Pipecat): Now pipecat acts as a bridge by taking the function output, sending it through a TTS service (e.g., ElevenLabs or Google TTS), and then streaming the audio back to the user or Manual TTS Integration: can also send the text to a separate TTS service yourself and handle the audio playback, though that involves a bit more manual work. Let me know If my suggestion were any help and If i am on right I will try to do a draft solution for this. Thanks |
Hi @demoncoder-crypto, thanks for coming back on this. For now I've figured out the solution is to send a await connection.conversation.item.create(
item={
"type": "function_call_output",
"call_id": callid,
"output": json.dumps(func_call_response)
}
)
await connection.send({"type": "response.create"}) However, slightly different issue. I haven't figured out a way to inject a message before the function call to suggest that it could be a long running task eg: event = {
"type": "conversation.item.create",
"item": {
"type": "message",
"role": "assistant",
"content": [
{
"type": "input_text",
"text": "Processing please wait..",
}
]
}
} |
So, while sending an item.create event is an interesting idea, I suspect it won't produce the immediate audio feedback you're looking for. Triggering local audio playback on the client upon receiving the function call request is likely the most effective solution for the "processing" message. |
Confirm this is a feature request for the Python library and not the underlying OpenAI API.
Describe the feature or improvement you're requesting
When using the the realtime api and a function call has been recognized and processed you are unable to send the result down the audio stream. This is a current limitation highlighted in the api documentation:
openai-python/src/openai/resources/beta/realtime/realtime.py
Line 750 in f66d2e6
When sending the response it gets detected in the history, but not send down the audio stream.
It would be great to have this put on the audio message. One workaround atm is to ask the assistant to repeat itself once it's finished processing, are there any better alternatives?
Thanks
Additional context
No response
The text was updated successfully, but these errors were encountered: