Yohei Nakajima's ActiveGraph Turns the Agent Trace Into the Runtime

The BabyAGI creator's May 21st arXiv paper argues for agents built around append-only logs, replay, forking, and lineage.

By ยท Published

Why it matters

Agent infrastructure is shifting from demos toward systems that can be audited, replayed, and debugged. ActiveGraph is small, but it frames that shift around a concrete primitive: the log as the runtime.

An abstract, stylized AI agent figure interacting with a branching, append-only data structure, metaphorically representing its 'trace' becoming its 'runtime' (Hand-drawn editorial illustration in the spirit of a New Yorker cover)

Yohei Nakajima (@yoheinakajima), the investor-builder behind BabyAGI, has put a sharper architecture around his agent work with ActiveGraph, an open-source Python runtime that treats an append-only event log as the source of truth for long-running AI agents.

The core work arrived on arXiv on May 21st, 2026, so the paper is six weeks old as of July 5th. The reason it is back in view is that ActiveGraph is no longer just a paper artifact. The GitHub repository shows a public Apache-2.0 implementation, 352 stars, 27 forks, 181 commits, and a v1.2.0 release dated July 3rd, while PyPI lists activegraph 1.2.0 as the current package release.

Nakajima is a familiar name to builders who spent 2023 experimenting with autonomous agents. His earlier BabyAGI project helped define the first wave of task-loop agents, where a system repeatedly executes a task, summarizes the result, and generates follow-up work. ActiveGraph is his attempt to preserve the ambition of those early agents while addressing the operational problem they exposed: once an agent runs for any length of time, the system needs to know exactly what happened, why it happened, and what would have changed under a different branch.

Nakajima's background matters here because ActiveGraph is aimed less at demo agents than at the messy evaluation work operators actually do. Scrum Ventures lists him as a venture partner who has supported early-stage startups for 15 years, helped spin up the Disney Accelerator at Techstars, later served as Techstars' Director of Pipeline across 30-plus accelerator programs, and led Scrum Studio partner programs with Nintendo, Dentsu, and Panasonic. The paper's worked example is an investment diligence pack.

The bet: make the log the system

Most agent frameworks start with the model call. The developer then adds tool use, routing, memory, policies, observability, and a database. ActiveGraph reverses that order. In the paper's HTML version, Nakajima describes a runtime where every meaningful change is an event: the goal, the rules, the tools available, the tool responses, the objects created, the relations between those objects, and the artifacts the agent produces.

The graph that the agent reads is a deterministic projection of that event log. Behaviors can be plain functions, classes, LLM-backed routines, or logic attached to typed edges. They react to changes in the graph and write new events back to the log. The paper states the coordination model plainly: components do not instruct one another directly; they coordinate through shared graph state.

That design gives ActiveGraph three primitives Nakajima is trying to make central: deterministic replay, forking from any event, and lineage from a goal down to the individual model call or tool response that produced an artifact. The project README reduces the idea to a simple notion: the graph is the world, behaviors are the rules, and the trace is the proof.

The arXiv comments note an open-source implementation with a reproducible quickstart demo, deterministic replay, fork-and-diff, and lineage tracing. The reference Diligence pack runs against recorded fixtures. The tutorial includes fork-and-diff, where a user branches an agent run at a historical event and compares the resulting structure against the parent. The paper frames forking as cheap because the shared prefix does not re-execute.

Why the fork matters

Forking is the most commercially interesting part of the architecture because it maps to how teams already debug judgment work. A product team does not simply ask whether an agent completed a task. It asks why the output changed after a prompt edit, which evidence the agent relied on, whether a policy intervention would have stopped a bad action, and whether a cheaper model could have produced the same artifact.

ActiveGraph turns those questions into graph and log operations. If the event log is complete enough, a run can be replayed. If model and tool outputs are cached under a determinism contract, the replay can avoid new nondeterministic calls. If a run can be branched at event 42, an operator can test a counterfactual without re-running the entire prefix.

That is where ActiveGraph differs from memory-first agent positioning. Memory layers tend to emphasize what the agent can recall across turns. ActiveGraph emphasizes whether the system can reconstruct the causal path by which an output came to exist. The Diligence pack makes the distinction concrete. The paper says its diligence example preserves the full causal structure behind a memo from the high-level goal to the artifacts, reconstructable from the log alone.

ActiveGraph supports Python 3.11+, ships a core runtime with SQLite storage and the Diligence pack, and organizes domain logic into packs. The reference Diligence pack includes 8 object types, 7 behaviors, 3 tools, and recorded fixtures.

A small project in a crowded market

ActiveGraph is entering a market where agent infrastructure is already funded, packaged, and crowded. LangGraph, from LangChain, positions itself as an agent runtime and low-level orchestration framework with durable execution, memory, human-in-the-loop controls, and production tooling. LangChain said in October 2025 that it raised $125 million at a $1.25 billion valuation, with IVP leading and Sequoia, Benchmark, Amplify, CapitalG, and Sapphire Ventures participating.

Memory vendors are also taking a separate slice of the same problem. Mem0 announced a $24 million Series A to build a memory layer for agents. CrewAI has been selling into the control-plane side of agent operations, with a November 2025 announcement around its agent operations platform.

ActiveGraph is smaller and earlier than those companies. There is no verified ActiveGraph financing, valuation, customer list, revenue, headcount, incorporation, or hosted product. The available evidence points to an open-source project led by Nakajima, with a paper, documentation, package releases, and a growing public repository. That narrowness is also the source of the project's clarity. ActiveGraph is not trying to own the entire agent engineering stack. It is making one strong claim: the audit trail should be the runtime substrate, rather than a diagnostic layer added after the model loop.

The paper frames its contributions as architectural rather than as performance results. It does not report that ActiveGraph improves accuracy over a baseline, and it discusses self-improving agents as an affordance rather than as a demonstrated result. That caveat is important because the 2023 BabyAGI wave taught builders how quickly agent demos can outrun practical reliability.

The founder's throughline

BabyAGI made the appeal of autonomous task loops obvious. It also made their weaknesses visible. Systems that continue generating and executing tasks need budgets, stopping conditions, provenance, and a way to understand why an output appeared. The most useful thing about ActiveGraph is that it reads like Nakajima's answer to what happened after the initial agent excitement met real operating constraints.

His investor background also explains the diligence example. A venture workflow is full of claims, evidence, contradictions, memos, and judgment calls. It is an unusually good test bed for lineage because the output is valuable only if a human can inspect how the agent got there. A diligence memo without evidence trails is just prose. A diligence memo with replayable causality becomes something an operator can audit, fork, and challenge.

That does not make ActiveGraph a company yet. It does make it a precise founder bet. Nakajima is taking the early agent loop he helped popularize and replacing the fragile center of gravity. The model still matters, but the durable object is the log: the record that lets a team replay the run, branch it, and prove where the answer came from.

Reader comments

Conversation for this story loads after sign-in.