Together AI raises $800M as Vipul Ved Prakash tests the open-model cloud thesis

The Series C values Together AI at $8.3 billion and gives it capital to buy the compute behind cheaper open-model inference.

By Ryan Merket · Published Jul 2, 2026, 12:43am CT

Why it matters

Together AI's round turns the open-model argument into an infrastructure financing story: cheaper inference matters only if the startup can secure enough compute and keep margins as prices fall.

abstract symbolic representation of the story's core idea (editorial illustration in the spirit of New Yorker or The Atlantic)

Vipul Ved Prakash's Together AI said on July 1 that it raised an $800 million Series C at an $8.3 billion post-money valuation, a round that makes the open-model infrastructure provider one of the clearest investor bets that AI spending is moving from model subscriptions into production inference.

The round was led by Aramco Ventures, with participation from Vista Equity Partners, General Catalyst, Emergence Capital, NVIDIA, March Capital, Pegatron, Salesforce Ventures, and S Ventures, according to Together AI's Business Wire release. Together AI also said it has secured commitments for more than 500 megawatts of compute capacity to be financed separately by new investors (also in the release).

Together AI's founding team is unusually deep on systems research. The company's leadership page lists Ved Prakash as founder and CEO, Ce Zhang as founder and CTO, Chris Re as founder, Tri Dao as founder and chief scientist, and Percy Liang as founder. The July 1 financing turns that research bench into a balance-sheet story.

Together AI's claim is direct. Closed frontier models are too expensive for many high-volume AI applications, especially when agents are running constantly across code, support, voice, search, and document workflows. Together AI sells the alternative: infrastructure for training, fine-tuning, and running open-weight models, plus GPU clusters and inference systems tuned for latency and cost.

The numbers should be read with care. Together AI says annual bookings crossed $1.15 billion last quarter and that it serves thousands of paying customers, including Cursor, Cognition, Decagon, ElevenLabs, and Suno. Those are company-supplied metrics, and bookings are not the same as recognized revenue. They still help explain why investors were willing to more than double the valuation from Together AI's February 2025 Series B, when the company announced a $305 million round led by General Catalyst and co-led by Prosperity7 at a $3.3 billion valuation.

The bet is inference, not model prestige

Together AI's raise lands at a point where the market is separating model performance from model economics. The major closed-model labs still define much of the frontier narrative. The budgets that matter for application companies are increasingly determined by how often a product has to call a model, how long each context is, how low latency must be, and whether every request needs a top-end proprietary model.

That is the opening Together AI is trying to own. Rather than broad claims, the tighter, more useful example is Decagon, because Together AI has a published customer case study: Decagon says it reached nearly 6x cost reduction per turn compared with closed models like GPT-5 mini and reduced p95 model latency to under 400 milliseconds on inputs up to tens of thousands of tokens. That is exactly the kind of workload where the bill matters: voice agents cannot tolerate long pauses, and 24/7 customer interactions punish expensive inference.

Together AI's product menu shows how broad the ambition has become. The company offers serverless inference, dedicated inference, GPU clusters, and fine-tuning. Its homepage frames the platform as a full-stack cloud for inference, model shaping, and pre-training, with research work on kernels, FlashAttention, and speculative decoding feeding into production.

That breadth is expensive. The $800 million round is less a trophy financing than a procurement plan. In AI infrastructure, capital does not sit idle for long. It turns into GPU reservations, power agreements, data center capacity, networking, storage, hiring, and the software required to keep utilization high enough for the economics to work.

Open models have moved from experiment to budget line

The demand case is no longer based only on open-source ideology. OpenRouter's State of AI study, which analyzed more than 100 trillion tokens of usage data, found that open-weight models reached about one-third of OpenRouter usage by late 2025. The same study said Chinese open-source models rose from a negligible base in late 2024 to nearly 30% of all model usage in some weeks, with DeepSeek, Qwen, MiniMax, Kimi, and other families taking meaningful share.

McKinsey's April 2025 report on open source technology in the age of AI, based on a survey of more than 700 technology leaders and senior developers across 41 countries, also found that 76% of respondents expected their organizations to increase use of open-source AI technologies over the next several years. The same report said respondents saw lower implementation costs and lower maintenance costs from open-source AI tools, while proprietary tools still had an advantage on speed to value.

Together AI is trying to monetize that split. If enterprises use both proprietary and open models, the winner in infrastructure may be the platform that routes workloads to the cheapest model that meets the quality bar. That favors companies with fast inference, model breadth, tuning tools, and enough hardware supply to absorb demand spikes.

The risk is just as clear. GPU clouds and inference providers face pricing pressure as hardware supply expands and hyperscalers improve support for open-weight models. Together AI has to prove that its research systems translate into durable margins, not temporary cost advantages. Aramco Ventures, NVIDIA, and Pegatron also reveal the physical constraint behind the software pitch: open-model adoption only becomes a large business if Together AI can keep buying and powering enough compute.

The founder's thesis gets a much bigger test

Ved Prakash wrote in Together AI's announcement that he and his co-founders started Together AI four years ago because they believed generative AI should be open and widely available rather than controlled by a small number of companies. He also wrote that Together AI has become "one of the largest producers of AI tokens in the world," a claim Together AI did not quantify in the post.

A July 1 post from Aligned News framed the financing as proof that open-source compute demand is accelerating:

Aligned News on X

The stronger version is more specific: Together AI's Series C shows that investors believe the winning AI infrastructure businesses will be judged by inference volume, utilization, latency, and cost per token as much as by model rankings.

That is a founder's kind of bet. Ved Prakash is aiming less at model-lab prestige and more at making Together AI the cheaper way to put open models into production. The $800 million round buys the chance to find out whether that is a category-defining cloud business or a capital-hungry race against every hyperscaler with GPUs to rent.

Why it matters

The bet is inference, not model prestige

Open models have moved from experiment to budget line

The founder's thesis gets a much bigger test

Reader comments