Fleet AI's Big Bet: Before Agents Run the Office, Send Them to the Gym

Insider round led by Sequoia, Bain, Menlo, and SVA; RR climbed from $1M six months ago to $63M now and is tracking $160M next quarter as Ouporov also previews a benchmark exposing spatial failures in frontier models.

By ·

Scoop: RuntimeWire original reporting.

Why it matters

A $45M insider-led A at a $725M valuation, paired with eye-popping RR claims, signals heavy conviction in Fleet's applied-AI approach. The founder-led hiring push for experienced operators suggests the company is scaling product and go-to-market aggressively.

Fleet AI's Big Bet: Before Agents Run the Office, Send Them to the Gym — Insider round led by Sequoia, Bain, Menlo, and SVA; RR climbed from $1M six months ago to $63M now and is tracking $160M next quarter as Ouporov also previews a benchm

RuntimeWire Investigative Report

Reporting note: The Series A and revenue run-rate figures below are source-based and should remain attributed unless confirmed on record by Fleet or its investors.

Fleet AI is not building another chatbot. It is building the place where chatbots, browser agents, coding agents, and future office agents go to fail before they are trusted with real work.

That is the simplest way to understand the company behind one of the sharpest hiring-velocity spikes in RuntimeWire's 90-day startup screen. Fleet describes itself publicly as a company that creates "simulated worlds and real-world challenges" to understand and shape the behavior of artificial intelligence systems. On its About page, the company is even blunter: it says it is building "training gyms for agents," high-fidelity simulation environments where AI systems can practice tasks while humans supervise. (Fleet, Fleet About)

According to a person familiar with the financing, Fleet has closed an unannounced $45 million Series A at a $725 million valuation, led by insider backers including Sequoia, Bain, Menlo, and SV Angel. The same person said Fleet's revenue run-rate has accelerated from about $1 million six months ago to roughly $63 million now, with the company tracking toward $160 million next quarter. Those figures are not public. But public market-intelligence firm Sacra reported in April that Fleet had reached an estimated $60 million in annualized revenue, up from $1 million in 2025, and was in talks to raise at least $50 million at around a $750 million valuation, with Bain Capital Ventures in talks to lead and existing investors Sequoia, Menlo, and SV Angel expected to participate. (Sacra)

The numbers are startling because Fleet, at least publicly, still looks small. Its careers page lists a compact set of open roles and frames the company as a team "shaping safe and capable AI" by creating a new category of work around auditing, training, and steering frontier systems. Public LinkedIn data is noisy, as it lists Fleet's company size as 11 to 50 employees while also showing 132 associated profiles. But the contradiction itself is useful: Fleet appears to be moving from a research lab into a high-touch deployment organization, where hiring speed can matter as much as model quality. (Fleet Careers, LinkedIn)

What is an "AI gym?"

The term "gym" has a specific lineage in AI. OpenAI Gym, released in 2016, gave reinforcement-learning researchers a common toolkit of environments where agents could be tested on standardized tasks, from games to robot simulations. The point was not the game itself. The point was repeatability: the same environment, the same action space, the same reward signal, and a way to compare progress. (OpenAI Gym)

Fleet is applying that idea to knowledge work.

A Fleet-style gym is not just a benchmark question with a right answer. It is a working simulation of a task environment. An agent is given a goal. It observes the world through a browser, software interface, document, spreadsheet, database, or visual canvas. It takes actions. The environment changes. A checker determines whether the agent actually completed the task, not whether it sounded persuasive while describing a plan.

Sacra describes Fleet's product as reinforcement-learning environments for enterprise workflows, including simulated replicas of software such as Salesforce and Excel. In that model, a customer selects a business process, Fleet configures a high-fidelity environment around it, agents practice inside the environment, humans supervise, and the environment can reset for another attempt. (Sacra)

Fleet's public developer artifacts point in the same direction. Its Python SDK lets users create and reset environments, set seeds and timestamps, inspect state, and connect to running instances. One Fleet-linked repository, "gym-anything," describes the goal more expansively: turn software into an agent environment, including browsers, IDEs, medical-record systems, CAD tools, and learning-management software. The example task has an agent interact through screenshots, mouse movement, and keyboard actions while an automatic checker grades the outcome. (Fleet SDK, gym-anything)

That is the "AI gym" thesis in one sentence: frontier models do not just need more text, they need practice fields.

For enterprises, that could become a new layer in the AI stack. Before an agent touches production workflows, it can be trained and evaluated in a synthetic replica. Before a lab ships a general-purpose computer-use model, it can test whether the agent reliably handles state, recovers from mistakes, and finishes long-horizon work. Before a company trusts an agent to update customer records, process invoices, reconcile spreadsheets, or operate a browser, it can make the agent prove itself in a sandbox that behaves like the real thing.

The business opportunity is obvious. The hard part is making the gyms realistic enough to matter.

The founder: ballet, robotics, and the art of controlled failure

Fleet's founder, Nicolai Ouporov, is an unusual fit for a company that sits at the intersection of simulation, agents, and work. His personal website identifies him as an "artist and researcher" interested in "the friction between machine and human intelligence," and lists Fleet alongside prior work at Respell, the Stanford Robotics and Embodied Artificial Intelligence Lab, Columbia's Creative Machines Lab, and intensive professional ballet training with Boston Ballet, San Francisco Ballet, and Ballet West. (Nicolai Ouporov)

That biography is not decorative. It explains the product.

Ouporov's earlier public profile shows a long-running obsession with bodies, tools, form, and feedback. In a Columbia arts interview, he described a 14-year pre-professional ballet background, visual work in sculpture and photography, and an interest in how the human body intersects with technology. He also framed artists as entrepreneurs, people who build systems around uncertainty and assemble the resources needed to make new work real. (Rat Rock Magazine)

In another interview, Ouporov said his robotics path began through art. He described art and scientific inquiry as linked, and said he learned embedded systems while building an art project involving mechatronics. That matters because Fleet's product is basically an industrialized version of that same loop: observe, act, evaluate, adjust, repeat. (Rat Rock Magazine)

Before Fleet, Ouporov was the founding engineer and first hire at Respell, a generative-AI automation startup that Salesforce later acquired. Respell positioned itself as a way for non-technical users to build AI-powered workflows, and Salesforce said the team would join its Agentforce effort. (Respell, TechCrunch)

That experience gave Ouporov a front-row seat to a problem every enterprise AI buyer now understands: demos are easy, reliable automation is hard. A model can write the email, but can it know which account record to update? It can summarize a ticket, but can it decide whether to escalate? It can operate a browser, but can it notice that it clicked the wrong tab three steps ago and repair the plan?

Fleet's answer is not just a better prompt. It is an environment.

Printing Machines is the public clue

Ouporov's recent "Printing Machines" benchmark, built with Fleet and collaborators James Zhou and Jerry Zhou, is the clearest public window into how he thinks.

The benchmark asks frontier models to reproduce target images using drawing tools, step by step. The authors describe drawing as a demanding test of perception, planning, execution, and memory, because the model must understand a target, choose actions, observe the partial result, and correct mistakes. The tool interface includes line and curve operations, undo, notes, and a scratchpad. (Printing Machines)

The results were not flattering to frontier models. The blog says models struggled to produce consistent drawings, diverged from targets over time, misidentified parts of images, and failed at geometric precision. One broader conclusion is especially relevant to Fleet: performance on static visual tasks does not guarantee robustness under incremental interaction. (Printing Machines)

That is a sharp critique of the current benchmark culture. Many AI tests still reward models for answering a question correctly in one shot. Fleet's work is aimed at something more physical, even when the environment is digital. Can the system perceive its own output? Can it recover from error? Can it follow constraints? Can it operate tools without drifting?

For Ouporov, the drawing benchmark also reads like a continuation of his arts background. A ballet dancer trains through repetition inside a constrained space. A visual artist learns by making marks, seeing what changed, and revising. A roboticist designs environments where embodied systems learn through interaction. Fleet compresses those patterns into enterprise AI infrastructure.

Why Fleet may be growing so fast

Fleet's momentum comes from a timing mismatch in the AI market.

Model companies want agents that can do work. Enterprises want agents that can be trusted. Neither side gets there by relying only on chat transcripts or static exams. They need environments where agents can practice real tasks and produce measurable traces of what went wrong.

That creates demand for three things Fleet appears to be packaging together: simulated software environments, human supervision, and reinforcement-learning infrastructure. Public repositories tied to Fleet include environment tooling, post-training infrastructure, and forks related to agent environments and reinforcement learning. One linked full-stack reinforcement-learning library describes components for training long-horizon agents, including environments for math, coding, search, SQL, and tool-use tasks. (Fleet GitHub, harbor-train)

This also explains the unusual hiring pattern. Fleet does not only need machine-learning researchers. It needs people who can model business workflows, build realistic software replicas, design evaluation tasks, manage data pipelines, deploy with customers, and coordinate human feedback. Its careers page mentions roles in deployments, generalist engineering, environment engineering, and domain-specific work, which fits the profile of a company building bespoke practice arenas faster than customers can define them. (Fleet Careers)

The risk is that this becomes a services business wearing a platform costume. High-fidelity simulations are labor-intensive. Every enterprise workflow has edge cases, permissions, messy data, exceptions, and undocumented human judgment. If Fleet has to handcraft too much for each customer, revenue can grow quickly while margins lag behind.

The counterargument is that the early service work may become the data moat. Each environment teaches Fleet how to model a category of work. Each failure mode becomes a reusable test. Each resettable workflow becomes a training asset. If Fleet can standardize enough of that process, it could become infrastructure for the agent economy rather than a consulting shop for frontier labs.

The real question Fleet is asking

Most AI startups are selling intelligence as an interface. Fleet is selling practice.

That distinction is important. The next bottleneck in AI may not be whether a model can answer harder questions. It may be whether an agent can act repeatedly inside messy systems without losing the plot. The difference between those two capabilities is the difference between a chatbot and a worker.

Ouporov's background makes Fleet's strategy feel less accidental than it first appears. The former ballet dancer knows discipline through repetition. The visual artist knows that perception is active, not passive. The robotics researcher knows intelligence has to survive contact with an environment. The former Respell engineer knows enterprise automation breaks when abstract reasoning meets real software.

Fleet's wager is that the AI industry is about to need training grounds as much as it needs models. If the reported financing and revenue acceleration are accurate, investors are already treating that wager as one of the next major infrastructure bets in AI.

The phrase "AI gym" sounds almost playful. Fleet is betting it becomes the place where the next generation of agents learns how to work.

Reader comments

Conversation for this story loads after sign-in.