Skip to content

Populate the audio stream with items added to the conversation #2262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
daltskin opened this issue Mar 27, 2025 · 3 comments
Open
1 task done

Populate the audio stream with items added to the conversation #2262

daltskin opened this issue Mar 27, 2025 · 3 comments

Comments

@daltskin
Copy link

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

  • This is a feature request for the Python library

Describe the feature or improvement you're requesting

When using the the realtime api and a function call has been recognized and processed you are unable to send the result down the audio stream. This is a current limitation highlighted in the api documentation:

Add a new Item to the Conversation's context, including messages, function

When sending the response it gets detected in the history, but not send down the audio stream.

  await connection.conversation.item.create(
      item={
          "type": "function_call_output",
          "call_id": callid,
          "output": json.dumps(func_call_response)
      }
  )

It would be great to have this put on the audio message. One workaround atm is to ask the assistant to repeat itself once it's finished processing, are there any better alternatives?

Thanks

Additional context

No response

@demoncoder-crypto
Copy link

Hi @daltskin,

As you currently identified right now, when the assistant processes a function call, it adds the result to the conversation history as text, but it doesn't automatically speak that result out loud. So, instead of hearing the output, users only see it, and the only workaround is to have the assistant repeat it, which isn’t very smooth.

My suggestions- Now we could modify the API so that when you add a function's output, there's an option (like a stream_audio=True flag) to also send that output to the audio stream automatically. Alternatively, a new method (say, synthesize_and_stream) could be introduced to handle this. We could use External Tools (like Pipecat): Now pipecat acts as a bridge by taking the function output, sending it through a TTS service (e.g., ElevenLabs or Google TTS), and then streaming the audio back to the user or Manual TTS Integration: can also send the text to a separate TTS service yourself and handle the audio playback, though that involves a bit more manual work.

Let me know If my suggestion were any help and If i am on right I will try to do a draft solution for this. Thanks

@daltskin
Copy link
Author

daltskin commented Apr 3, 2025

Hi @demoncoder-crypto, thanks for coming back on this.

For now I've figured out the solution is to send a response.create message to the connection after the function call response eg:

 await connection.conversation.item.create(
      item={
          "type": "function_call_output",
          "call_id": callid,
          "output": json.dumps(func_call_response)
      }
  )

await connection.send({"type": "response.create"})

However, slightly different issue. I haven't figured out a way to inject a message before the function call to suggest that it could be a long running task eg:

event = {
    "type": "conversation.item.create",
    "item": {
        "type": "message",
        "role": "assistant",
        "content": [
            {
                "type": "input_text",
                "text": "Processing please wait..",
            }
        ]
    }
}

@demoncoder-crypto
Copy link

So, while sending an item.create event is an interesting idea, I suspect it won't produce the immediate audio feedback you're looking for. Triggering local audio playback on the client upon receiving the function call request is likely the most effective solution for the "processing" message.
But I do have to test this extensively, Its quite a solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants