Lindy CEO says DeepSeek V4 replaced Anthropic in production, with a caveat

Lindy claims performance improved after the move, but says Anthropic Opus may still handle some failed tasks.

By ·

Why it matters

Lindy's claim points to a shift from model loyalty to model routing: AI companies may increasingly treat frontier labs as interchangeable suppliers, keeping premium models only for edge cases.

An intricate diagram illustrating the primary and backup computational systems of an AI workflow (vintage scientific illustration — engraved plate from a 19th-century journal, sepia ink on cream paper)

Lindy's CEO said the company moved production workloads from Anthropic to DeepSeek V4 and saw performance improve, according to a thread on X that framed the switch as both a cost and product-quality win.

Lindy CEO on X

The claim is notable because it is not just another model-price comparison. Lindy is saying it changed what runs its product in production. But the strongest supported version of the story is narrower than the headline: Lindy says DeepSeek V4 is now carrying production workloads that previously ran on Anthropic, while the thread also says Lindy will "probably still escalate to opus" when it detects Lindy is failing at a task.

That caveat matters. A full replacement story is cleaner than a routing story, but production AI systems are often judged at the margins, where a more expensive model can still be worth calling for the hardest or highest-risk cases. Lindy's move looks less like a simple vendor swap and more like a bet that DeepSeek (@deepseek) can handle the bulk of work well enough that Anthropic's highest-end model becomes an exception path.

The work behind the switch

The unnamed Lindy CEO did not publish benchmark data, cost figures, latency numbers, task-completion rates, or an evaluation methodology in the thread. What the thread did disclose is that the migration was operationally expensive.

"Super proud of the team for pulling off what ended up being 100x more work than we thought -- you have no idea how much infra and internal tooling we had to build to get to this point," the CEO wrote in the continuation thread, adding that an engineering blog post would follow. In another reply to Thanh Pham (@runsonai), the CEO said Lindy "literally had to build a GEPA prompt iterator."

That is the part operators should watch. The cost savings from changing model providers can be overwhelmed by migration work if prompts, evals, fallbacks, observability, and routing logic are brittle. Lindy's thread suggests the company's edge was not simply finding a cheaper model, but building enough internal tooling to trust the change in production.

Anthropic is still in the picture

The thread is unusually careful about Anthropic, even while describing pressure from Chinese models. The CEO wrote that they remain "a big fan of Anthropic" and expect it to be fine because of enterprise relationships, developer brand, future capacity, next-generation models, and moving up the stack. Those are the CEO's claims, not independently verified performance data.

Anthropic, founded in 2021 by former OpenAI employees including Dario Amodei and Daniela Amodei, is best known for its Claude model family and its AI-safety positioning. Lindy's comment lands in a market where model buyers increasingly have leverage: the question is no longer whether a frontier lab can produce a strong model, but whether it can remain the default choice when customers have credible alternatives and can route around cost or capacity constraints.

The Opus fallback is especially important. In a reply to Quinn Slack (@sqs), the CEO said Lindy has the fallback because of an "absurd max plan subsidy" and will probably still escalate to Opus when task failure is detected, though the CEO described that usage as marginal. That makes the "100%" language more complicated than a permanent exclusion of Anthropic from Lindy's stack.

The missing proof

Lindy has not provided the numbers that would let outsiders evaluate the claim: no before-and-after cost delta, no accuracy or completion-rate benchmark, no task mix, no sample size, and no breakdown of how often Lindy still expects to call Opus. Without those, the performance claim should be treated as Lindy's account of its own production experience, not evidence that DeepSeek V4 is generally better than Anthropic models.

Still, founder-led infrastructure decisions like this can be more revealing than public benchmarks. Benchmarks are generic; production systems expose model behavior under messy prompts, user-specific context, retries, rate limits, and support costs. If Lindy can publish the engineering detail behind the switch, the interesting question will be what part of the gain came from DeepSeek V4 itself and what part came from Lindy's new tooling around routing, prompting, and failure detection.

For now, the lesson is pragmatic: Lindy is not betting that one model wins every task. It is betting that the default model can change, that expensive escalation can become marginal, and that the companies willing to rebuild their AI infrastructure may capture savings and product gains before their competitors do.

Reader comments

Conversation for this story loads after sign-in.