Populate the audio stream with items added to the conversation #2262

daltskin · 2025-03-27T17:33:27Z

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

This is a feature request for the Python library

Describe the feature or improvement you're requesting

When using the the realtime api and a function call has been recognized and processed you are unable to send the result down the audio stream. This is a current limitation highlighted in the api documentation:

openai-python/src/openai/resources/beta/realtime/realtime.py

Line 750 in f66d2e6

Add a new Item to the Conversation's context, including messages, function

When sending the response it gets detected in the history, but not send down the audio stream.

  await connection.conversation.item.create(
      item={
          "type": "function_call_output",
          "call_id": callid,
          "output": json.dumps(func_call_response)
      }
  )

It would be great to have this put on the audio message. One workaround atm is to ask the assistant to repeat itself once it's finished processing, are there any better alternatives?

Thanks

Additional context

No response

demoncoder-crypto · 2025-04-02T20:24:34Z

Hi @daltskin,

As you currently identified right now, when the assistant processes a function call, it adds the result to the conversation history as text, but it doesn't automatically speak that result out loud. So, instead of hearing the output, users only see it, and the only workaround is to have the assistant repeat it, which isn’t very smooth.

My suggestions- Now we could modify the API so that when you add a function's output, there's an option (like a stream_audio=True flag) to also send that output to the audio stream automatically. Alternatively, a new method (say, synthesize_and_stream) could be introduced to handle this. We could use External Tools (like Pipecat): Now pipecat acts as a bridge by taking the function output, sending it through a TTS service (e.g., ElevenLabs or Google TTS), and then streaming the audio back to the user or Manual TTS Integration: can also send the text to a separate TTS service yourself and handle the audio playback, though that involves a bit more manual work.

Let me know If my suggestion were any help and If i am on right I will try to do a draft solution for this. Thanks

daltskin · 2025-04-03T14:51:40Z

Hi @demoncoder-crypto, thanks for coming back on this.

For now I've figured out the solution is to send a response.create message to the connection after the function call response eg:

 await connection.conversation.item.create(
      item={
          "type": "function_call_output",
          "call_id": callid,
          "output": json.dumps(func_call_response)
      }
  )

await connection.send({"type": "response.create"})

However, slightly different issue. I haven't figured out a way to inject a message before the function call to suggest that it could be a long running task eg:

event = {
    "type": "conversation.item.create",
    "item": {
        "type": "message",
        "role": "assistant",
        "content": [
            {
                "type": "input_text",
                "text": "Processing please wait..",
            }
        ]
    }
}

demoncoder-crypto · 2025-04-03T18:10:04Z

So, while sending an item.create event is an interesting idea, I suspect it won't produce the immediate audio feedback you're looking for. Triggering local audio playback on the client upon receiving the function call request is likely the most effective solution for the "processing" message.
But I do have to test this extensively, Its quite a solution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Populate the audio stream with items added to the conversation #2262

Populate the audio stream with items added to the conversation #2262

daltskin commented Mar 27, 2025

demoncoder-crypto commented Apr 2, 2025

daltskin commented Apr 3, 2025

demoncoder-crypto commented Apr 3, 2025

Populate the audio stream with items added to the conversation #2262

Populate the audio stream with items added to the conversation #2262

Comments

daltskin commented Mar 27, 2025

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

Describe the feature or improvement you're requesting

Additional context

demoncoder-crypto commented Apr 2, 2025

daltskin commented Apr 3, 2025

demoncoder-crypto commented Apr 3, 2025