Zyphra Releases ZONOS2, an Open-Weight Real-Time Voice-Cloning Model

Zyphra is pairing open weights, Apache 2.0 licensing, hosted inference, and its own TTS eval in a direct challenge to closed voice platforms.

By Ryan Merket · Published Jun 12, 2026, 4:18pm CT

Why it matters

Zyphra is trying to turn open-weight voice cloning into a cloud business: the model is permissively licensed, but the easiest path to production runs through Zyphra Cloud.

Zyphra Releases ZONOS2, an Open-Weight Real-Time Voice-Cloning Model — Zyphra is pairing open weights, Apache 2.0 licensing, hosted inference, and its own TTS eval in a direct challenge to closed voice platforms.

poster=/api/storage/public-objects/tweet-videos/zyphra-zonos2-open-weight-real-time-tts-poster-40f34b47.jpg|Launch video - @ZyphraAI

Zyphra (@ZyphraAI) released ZONOS2, an Apache 2.0 real-time text-to-speech model for high-fidelity voice cloning, the company said Friday in a thread on X and a launch blog post.

The release is broader than a hosted API update. Zyphra is publishing model weights on Hugging Face, inference code on GitHub, and evaluation code for ZTTS1-Eval, a new benchmark the company says is designed to measure TTS quality across both clean and in-the-wild audio. In practical terms, Zyphra is trying to make the model, the serving path, and the scorecard inspectable at the same time.

The core architecture is sparse mixture-of-experts: 8 billion total parameters, with 900 million active during inference, according to Zyphra. That distinction matters because the company is not pitching ZONOS2 only as a larger voice model. It is arguing that sparsity lets the system preserve real-time performance while increasing expressiveness and cloning quality, a balance that is central for any model meant to compete with polished, closed voice platforms.

Zyphra also calls ZONOS2 the first open-source MoE TTS model. That claim should be read as the company's framing, not as an independently verified category milestone. The more concrete point is that Zyphra is pairing open weights with an Apache 2.0 license, which is a permissive posture for a voice-cloning model and a direct contrast to closed voice systems that expose capability mainly through hosted products.

The distribution strategy is two-track. Developers can download the weights and run the inference code themselves, while Zyphra is also offering the model through Zyphra Cloud, which the company says is powered by AMD (@AMD). Zyphra said cloud access is free for a limited promotional period, giving the company a path to seed usage while still steering production users toward hosted inference.

That makes the release a commercial bet as much as a research one. Zyphra is using openness to earn trust and developer adoption, then relying on cloud performance, convenience, and infrastructure to monetize. The question is whether an open-weight model can be good enough and fast enough that teams use it as a default building block, instead of treating it as a research artifact beside more controlled proprietary APIs.

The eval release is an important part of that pitch. Zyphra says ZTTS1-Eval focuses on TTS quality across clean and in-the-wild audio, with attention to speaker similarity and prosody rather than only word error rate. That is the right axis for voice cloning: the best clone is not necessarily the one that produces the cleanest transcript for an ASR judge, and a model can be intelligible while still failing on likeness, rhythm, affect, or delivery.

It also gives Zyphra more surface area for scrutiny. By releasing eval code, the company is inviting developers to test whether its benchmark reflects real-world voice-cloning quality or merely formalizes the criteria where ZONOS2 performs well. For a model whose appeal depends on fidelity and speed, third-party replication of those quality claims will matter more than the launch framing.

The significance of ZONOS2, then, is not just that Zyphra has released another text-to-speech model. It is that the company is compressing several open-source AI playbooks into one voice release: a permissive license, public weights, runnable code, a benchmark, and a hosted inference business. If the model's real-time and cloning claims hold up outside Zyphra's own materials, it could put pressure on closed voice vendors to justify why developers should accept less transparency for similar capability.

Why it matters

Reader comments