ElevenLabs launches Speech Engine to turn chat agents into voice with one prompt
The new pipeline layers onto existing stacks, adds 70+ languages, enterprise compliance, and pricing from 8 cents per minute via ElevenAPI.
By Ryan Merket ·
Why it matters
Founders shipping chat-based agents can add voice without rebuilding their stack. A single vendor pipeline for TTS, ASR, and orchestration reduces integration risk, meets enterprise compliance, and shortens time to production voice experiences.

ElevenLabs introduced Speech Engine, a developer offering that turns existing chat agents into full voice agents with one prompt, in a thread on X. The product bundles the companys speech, transcription, and voice orchestration models into a single pipeline.
https://x.com/ElevenLabs/status/2057155693623361667
The company says Speech Engine sits on top of a teams current agent, so nothing needs to be rearchitected. Installation is positioned as a one-command setup via their skill: npx skills add elevenlabs/skills --skill speech-engine. Once integrated, developers can add expressive, human-like voices across 70+ languages.
On the capture side, ElevenLabs highlights transcription optimized for conversational use, with ultra-low latency and robustness to noisy, real-world environments. For regulated deployments, the company is pitching enterprise-grade protections including SOC 2, HIPAA, and GDPR support, plus EU data residency and a Zero Retention Mode.

ElevenLabs also notes teams can migrate to ElevenAgents, its broader agent platform, for additional deployment channels, monitoring, analytics, and tooling. A full product walkthrough was shared from the AI Engineer (@aiDotEngineer) conference in London on YouTube. Speech Engine is available now in ElevenAPI, with pricing starting at 8 cents per minute and decreasing with scale, and details on the product page at elevenlabs.io/speech-engine.