xAI
xAI: Grok Voice TTS 1.0
x-ai/grok-voice-tts-1.0
Grok Voice TTS 1.0 is a text-to-speech model from xAI. It converts text into spoken audio across 20+ languages with automatic language detection, and offers five built-in voices (Eve, Ara, Rex, Sal, Leo) covering a range of tones. Inline speech tags allow control over pauses, emphasis, pitch, speed, and vocal style. Output is available in MP3, WAV, PCM, μ-law, and A-law formats at sample rates from 8 kHz to 48 kHz, with up to 15,000 characters per request.
- Context window: 15,000 tokens
- Input: text
- Output: speech
View on OpenRouter. Model data sourced from OpenRouter.