Microsoft

Microsoft: MAI-Voice-2

microsoft/mai-voice-2

MAI-Voice-2 is a high-fidelity, expressive text-to-speech model from Microsoft, powered by Azure AI Speech. It synthesizes natural-sounding speech across 10+ languages with support for expressive SSML styles (cheerful, sad, excited, etc.) and speed control (0.5×–2×). Voice names follow the Azure locale format (e.g., en-US-Harper:MAI-Voice-2). Output is available in MP3 and PCM at 24 kHz.

  • Input: text
  • Output: speech

View on OpenRouter. Model data sourced from OpenRouter.