Skip to content

feat(api): new models for TTS, STT, + new audio features for Realtime #298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .stats.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
configured_endpoints: 76
openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/openai%2Fopenai-b26121d5df6eb5d3032a45a267473798b15fcfec76dd44a3256cf1238be05fa4.yml
openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/openai%2Fopenai-c22f59c66aec7914b6ee653d3098d1c1c8c16c180d2a158e819c8ddbf476f74b.yml
7 changes: 7 additions & 0 deletions api.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,9 +145,16 @@ Params Types:

## Transcriptions

Params Types:

- <a href="https://pkg.go.dev/github.com/openai/openai-go">openai</a>.<a href="https://pkg.go.dev/github.com/openai/openai-go#TranscriptionInclude">TranscriptionInclude</a>

Response Types:

- <a href="https://pkg.go.dev/github.com/openai/openai-go">openai</a>.<a href="https://pkg.go.dev/github.com/openai/openai-go#Transcription">Transcription</a>
- <a href="https://pkg.go.dev/github.com/openai/openai-go">openai</a>.<a href="https://pkg.go.dev/github.com/openai/openai-go#TranscriptionStreamEvent">TranscriptionStreamEvent</a>
- <a href="https://pkg.go.dev/github.com/openai/openai-go">openai</a>.<a href="https://pkg.go.dev/github.com/openai/openai-go#TranscriptionTextDeltaEvent">TranscriptionTextDeltaEvent</a>
- <a href="https://pkg.go.dev/github.com/openai/openai-go">openai</a>.<a href="https://pkg.go.dev/github.com/openai/openai-go#TranscriptionTextDoneEvent">TranscriptionTextDoneEvent</a>

Methods:

Expand Down
7 changes: 5 additions & 2 deletions audio.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,14 @@ func NewAudioService(opts ...option.RequestOption) (r *AudioService) {
type AudioModel = string

const (
AudioModelWhisper1 AudioModel = "whisper-1"
AudioModelWhisper1 AudioModel = "whisper-1"
AudioModelGPT4oTranscribe AudioModel = "gpt-4o-transcribe"
AudioModelGPT4oMiniTranscribe AudioModel = "gpt-4o-mini-transcribe"
)

// The format of the output, in one of these options: `json`, `text`, `srt`,
// `verbose_json`, or `vtt`.
// `verbose_json`, or `vtt`. For `gpt-4o-transcribe` and `gpt-4o-mini-transcribe`,
// the only supported format is `json`.
type AudioResponseFormat string

const (
Expand Down
10 changes: 7 additions & 3 deletions audiospeech.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,21 +43,25 @@ func (r *AudioSpeechService) New(ctx context.Context, body AudioSpeechNewParams,
type SpeechModel = string

const (
SpeechModelTTS1 SpeechModel = "tts-1"
SpeechModelTTS1HD SpeechModel = "tts-1-hd"
SpeechModelTTS1 SpeechModel = "tts-1"
SpeechModelTTS1HD SpeechModel = "tts-1-hd"
SpeechModelGPT4oMiniTTS SpeechModel = "gpt-4o-mini-tts"
)

type AudioSpeechNewParams struct {
// The text to generate audio for. The maximum length is 4096 characters.
Input param.Field[string] `json:"input,required"`
// One of the available [TTS models](https://platform.openai.com/docs/models#tts):
// `tts-1` or `tts-1-hd`
// `tts-1`, `tts-1-hd` or `gpt-4o-mini-tts`.
Model param.Field[SpeechModel] `json:"model,required"`
// The voice to use when generating the audio. Supported voices are `alloy`, `ash`,
// `coral`, `echo`, `fable`, `onyx`, `nova`, `sage` and `shimmer`. Previews of the
// voices are available in the
// [Text to speech guide](https://platform.openai.com/docs/guides/text-to-speech#voice-options).
Voice param.Field[AudioSpeechNewParamsVoice] `json:"voice,required"`
// Control the voice of your generated audio with additional instructions. Does not
// work with `tts-1` or `tts-1-hd`.
Instructions param.Field[string] `json:"instructions"`
// The format to audio in. Supported formats are `mp3`, `opus`, `aac`, `flac`,
// `wav`, and `pcm`.
ResponseFormat param.Field[AudioSpeechNewParamsResponseFormat] `json:"response_format"`
Expand Down
1 change: 1 addition & 0 deletions audiospeech_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ func TestAudioSpeechNewWithOptionalParams(t *testing.T) {
Input: openai.F("input"),
Model: openai.F(openai.SpeechModelTTS1),
Voice: openai.F(openai.AudioSpeechNewParamsVoiceAlloy),
Instructions: openai.F("instructions"),
ResponseFormat: openai.F(openai.AudioSpeechNewParamsResponseFormatMP3),
Speed: openai.F(0.250000),
})
Expand Down
Loading