Microsoft
Microsoft: MAI-Voice-2
microsoft/mai-voice-2
MAI-Voice-2 is a high-fidelity, expressive text-to-speech model from Microsoft, powered by Azure AI Speech. It synthesizes natural-sounding speech across 10+ languages with support for expressive SSML styles (cheerful, sad, excited, etc.) and speed control (0.5×–2×). Voice names follow the Azure locale format (e.g., en-US-Harper:MAI-Voice-2). Output is available in MP3 and PCM at 24 kHz.
- Input: text
- Output: speech
View on OpenRouter. Model data sourced from OpenRouter.