Published 3 days ago • loading... • Updated 21 hours ago

OpenAI Unveils Three Audio Models for Real-Time Voice Tasks

On Thursday, OpenAI introduced three new Realtime voice models—GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper—to its Realtime API, designed to help developers create more conversational software agents.
The launch follows viral criticism from TikTok user Husk regarding previous models' inability to accurately 'start a timer,' prompting OpenAI to build systems that better handle interruptions and complex user requests.
GPT-Realtime-2, the flagship model featuring 'GPT-5-class reasoning,' is priced at $32 per 1M audio input tokens and $64 per 1M output tokens; the translation model supports more than 70 input languages and 13 output languages.
Companies including Zillow, Priceline, and Deutsche Telekom are currently testing these models to develop voice assistants that can 'keep pace' with users, enabling actions like scheduling tours during live conversations.
Moving beyond simple chat, these tools enable 'voice-to-action' capabilities that allow developers to build agents listening, reasoning, and performing complex workflows in real time, marking a broader shift in computing interfaces.

Insights by Ground AI

31 Articles

OpenAI launches GPT-Realtime-2 and two new voice API models

GPT-Realtime-2 brings GPT-5-class reasoning to live voice. A separate translation model covers 70+ input languages. A streaming Whisper variant handles transcription. The pricing is aggressive enough to make the comparison unavoidable. OpenAI released three new voice models in its API, broadening the range of surfaces where developers can plug GPT-class reasoning into live audio. The […] This story continues at The Next Web

1 day ago·Amsterdam, Netherlands (Kingdom of the)

Read Full Article

Digital Trends

Center

OpenAI’s new voice AI can listen, think, and talk back in 70+ languages

OpenAI launched three new audio models that can reason, translate across 70+ languages, and transcribe speech in real time, making voice a genuinely useful interface for developers.

2 days ago·United States

Read Full Article

TechCrunch

Center

OpenAI launches new voice intelligence features in its API

The new features could be handy for customer service systems, but OpenAI says they have applications that work across a variety of other fields, including education and creator platforms.

2 days ago·United States

Read Full Article

Interesting Engineering

Center

OpenAI launches GPT-Realtime-2 for smarter live voice AI interactions

OpenAI has introduced three new audio models through its API, expanding its push into real-time voice AI for developers. The launch includes GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, each targeting a different part of live voice interaction. The company said the new models aim to make voice software more useful in everyday situations. That includes handling conversations while driving, navigating airports, or getting cust…

2 days ago

Read Full Article