Decagon puts its CX agents on a self-improvement loop

Duet Autopilot turns live support conversations into proposed agent updates, but Decagon is keeping humans in the approval seat.

By ยท Published

Why it matters

Decagon is betting that the next advantage in enterprise AI support is not another chatbot, but the operating system that finds failures, tests fixes and keeps humans in control.

A technical diagram of a customer service feedback loop, showing AI-driven updates and human approval (Architectural drafting blueprint with hand-annotated elements and ruler marks)

Jesse Zhang and Ashwin Sreenivas's Decagon launched Duet Autopilot on June 9, adding a human-reviewed self-improvement loop to its enterprise customer-service agents, according to a company blog post.

The product is the next step in the system Zhang, Decagon's co-founder and CEO, and Sreenivas, its co-founder and president, have been building around a clear operating thesis: customer-support teams should define AI-agent behavior in natural language, observe what happens in production, and improve the agent without turning every edge case into an engineering project.

Decagon calls the new product "the first verified self-improving AI agent for CX." That claim is Decagon's; the launch materials do not independently establish the basis for the "first" label. The mechanics are more concrete. Autopilot analyzes production conversations, identifies failure patterns such as unnecessary human escalations, traces those problems back to the agent's underlying logic, proposes edits to Decagon's Agent Operating Procedures, and validates those edits before they reach production.

The product is aimed at the maintenance burden

Most enterprise AI-agent launches have the same hidden cost: the agent is not done when it goes live. It has to be watched, tuned, regression-tested and updated as policies change, customers phrase things differently, and the model makes mistakes in ways dashboards do not always capture.

Decagon has been selling against that workload with Duet, its workflow for building and improving agents. Autopilot pushes the same idea further by trying to close the loop between what customers say in live interactions and what the AI agent should do differently next time.

According to Decagon's Autopilot launch post, Autopilot tests proposed changes against the original conversation that surfaced the issue and against a "golden test set" of hundreds of conversations curated by Duet. If the change creates a regression or does not improve the target behavior, Autopilot is designed to iterate again. If it passes, the proposed update appears in a versioned workspace for review.

That last step is the important one. Autopilot is not being positioned as a system that silently rewrites production support behavior. Decagon says every proposed change requires human approval, and teams can set upfront constraints around brand voice, writing standards, policy preferences and procedures that Autopilot should not modify.

Why Decagon is drawing the line at approval

The customer-experience market is a useful test case for autonomous agents because the feedback loop is unusually rich and unusually risky. Every chat, email, voice call or SMS interaction is both training signal and brand liability. A bad answer can waste a support agent's time; the wrong policy answer can create a compliance or customer-trust problem.

That is why the launch is less about full autonomy than about compressing the operational cycle. Decagon says AI can analyze 100% of interactions, but analysis alone still leaves support leaders with triage, prioritization, remediation and testing. Autopilot is meant to turn that pile of observations into specific, tested changes that a human can accept or reject.

In a thread on X, the launch was described as part of a shift from quick inference calls toward longer-running autonomous agents as models get stronger. In customer operations, that shift does not mean letting the model do whatever it wants. It means using more expensive reasoning and evaluation in the background so the frontline agent gets better without a support-ops team manually replaying every failure.

Jesse Zhang on X

The customer claims are still company-supplied

Decagon already markets itself around measurable support outcomes. Its homepage says Chime reports 70% chat and voice resolution, Duolingo sees an 80% deflection rate, ClassPass cites a 95% cost reduction, Oura shows a 3x increase in CSAT, and Hunter Douglas attributes $1M in revenue to fully AI-handled conversations. Those are Decagon-presented customer metrics, and the launch materials do not state which of those customers are already using Autopilot specifically.

Decagon has drawn backing from Andreessen Horowitz (a16z), Accel, Bain Capital Ventures, Coatue, and Index Ventures, which reflects how competitive customer-service automation has become as an enterprise AI wedge. The product question is no longer whether an AI agent can answer a support ticket. It is whether the vendor can prove that the agent improves over time without forcing a human team to become its permanent maintenance crew.

Autopilot is Decagon's answer to that maintenance problem. The open question is not whether self-improvement sounds useful. It is whether Decagon's validation layer can catch regressions reliably enough for enterprise teams to trust proposed changes in the workflows where support quality, policy adherence and customer experience collide.

Reader comments

Conversation for this story loads after sign-in.