OpenAI 推出三款实时语音 API 模型，GPT-Realtime-2 首搭 GPT-5 级推理能力

ref

OpenAI 于 5 月 7 日发布三款面向开发者的实时语音模型，均通过 Realtime API 提供。GPT-Realtime-2 是首款具备 GPT-5 级推理能力的语音模型，可在通话过程中边推理边保持对话流畅，支持并行工具调用，在音频智能基准 Big Bench Audio 上较前代 GPT-Realtime-1.5 提升 15.2%，在多轮对话评测 Audio MultiChallenge 上提升 13.8%，按音频 token 计费（输入 32 美元 / 百万 token，输出 64 美元 / 百万 token）。GPT-Realtime-Translate 为实时翻译模型，支持逾 70 种语言输入、13 种语言输出，可与说话者同步推进，按分钟计费（0.034 美元 / 分钟）。GPT-Realtime-Whisper 则为流式语音转文字模型，可在说话人开口同时实时输出文字，按分钟计费（0.017 美元 / 分钟）。三款模型均已通过 OpenAI Playground 开放测试。

OpenAI 指出，此次发布标志着实时语音从"问答式交互"向"能实际完成任务的语音界面"的跃升——模型可在对话过程中持续监听、推理、翻译、转录并调用工具。已投入测试的企业客户包括 Zillow（用于语音代理客服）和德国电信（用于跨语言客户服务），OpenAI 称两者均反馈通话成功率与合规鲁棒性明显提升。此外，三款模型内置主动内容分类器，可在检测到违规内容时中断对话，并支持欧盟数据驻留（EU Data Residency）及企业隐私承诺。

OpenAI | TechCrunch | 9to5Mac

OpenAI launches new voice intelligence features in its API | TechCrunch

The new features could be handy for customer service systems, but OpenAI says they have applications that work across a variety of other fields, including education and creator platforms.

TechCrunch (techcrunch.com)

OpenAI has new voice models that reason, translate, and transcribe as you speak - 9to5Mac

OpenAI has just released three new realtime voice models that it says will “unlock a new class of voice apps...

9to5Mac (9to5mac.com)

WeLinux

OpenAI 推出三款实时语音 API 模型，GPT-Realtime-2 首搭 GPT-5 级推理能力

OpenAI launches new voice intelligence features in its API | TechCrunch

OpenAI has new voice models that reason, translate, and transcribe as you speak - 9to5Mac