Overview
Translates audio in any supported language into English text. Unlike transcription, this endpoint always outputs English text regardless of the input language.This page documents audio translation (
POST /v1/audio/translations). For text translation, use POST /v1/translations.Do not use
recommended_for=translation for this endpoint. That recommendation scene is reserved for text translation models on POST /v1/translations.Request Body
The audio file to translate. Supported formats:
flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm. Maximum file size is 25 MB.The model to use. Currently only
whisper-1 is supported.An optional text to guide the model’s style or continue a previous segment. Should be in English.
The format of the output. Options:
json, text, srt, verbose_json, vtt.The sampling temperature, between 0 and 1. Higher values like 0.8 produce more random output, while lower values like 0.2 make output more focused and deterministic.
Response
The translated text in English.
verbose_json format, the response also includes:
The detected language of the input audio.
The duration of the input audio in seconds.
Segments of the translated text with timestamps.
Translation vs Transcription
| Feature | Translation | Transcription |
|---|---|---|
| Output language | Always English | Same as input |
| Use case | Convert foreign audio to English | Preserve original language |
| Language parameter | Not applicable | Optional hint |
The translation endpoint automatically detects the source language and translates to English. The
language parameter from transcription is ignored.