Hermes Agent's new async subagents take aim at the blocking-agent problem

Teknium says the delegate tool can now fan out work without freezing the chat, a practical change for long-running agent workflows.

By Ryan Merket · Published Jun 15, 2026, 6:35pm CT

Why it matters

Asynchronous subagents turn Hermes delegation from a blocking trick into a usable workflow primitive, putting pressure on open-agent runtimes to manage parallel work, not just call tools.

A lone AI engineer or 'main agent' overseeing a complex, distributed system (Oil painting in the manner of Edward Hopper, emphasizing stark contrasts, human-scale isolation within a large structure, and a quiet narrative.)

Teknium (@Teknium), the Nous Research cofounder and head of post-training behind the Hermes model line, said Monday that Hermes Agent now supports asynchronous subagents, moving one of the open-source agent's core orchestration tools out of the wait-until-it-finishes pattern that has constrained multi-agent work.

The change lands in the existing delegate_task path, the tool Hermes Agent uses to spawn child agents for parallel work. In Teknium's words, the delegate tool "no longer blocks your chat." Users can get the update by running hermes update, according to his post.

ELI5

Before this update, asking Hermes Agent to hand work to helper agents could make the main chat wait, like sending assistants to research something and being unable to talk to the manager until they came back. With asynchronous subagents, the helpers can work in the background while the user keeps using the chat. That matters because multi-agent workflows are only useful if the operator can keep steering the work instead of staring at one long-running task.

https://x.com/Teknium/status/2066619275989991861

That is a small interface change with a larger systems implication. Hermes Agent already let a parent agent split a job into subagents, each with its own conversation, terminal session, and toolset. The value was isolation: the child agent could review code, research a topic, or compare alternatives without dragging every intermediate tool call into the main context window. The tradeoff, documented in Hermes' own delegation guide, was that delegate_task was treated as a synchronous, turn-scoped mechanism: if the parent turn was interrupted, active child work could be cancelled and discarded.

Teknium's update targets the bottleneck operators feel first, not the one architecture diagrams emphasize. If an agent needs to fan out a dozen audits, searches, or refutations, the chat being blocked is the practical failure mode. It prevents the user from steering, interrupting, or continuing parallel work as a live session. Asynchronous subagents make delegation look less like a single blocking tool call and more like an operating surface for work in flight.

The post also clarifies where Nous is drawing the next boundary. Teknium told one user there is already a maximum concurrent subagents setting in the config and dashboard. He told another that subagent timeouts are configurable. On tool access, he said subagents will have their MCP tools to use. On nesting, he said both width and depth are now unbound, and that users can allow subagents to call subagents of their own.

Those replies matter because agent orchestration tends to fail at the edges: too many parallel children, unclear timeout behavior, tool permissions that silently differ from the parent, or nested workers that create cost and state explosions. Hermes' published docs have historically described default concurrency of three parallel subagents, configurable through delegation.max_concurrent_children, and opt-in nested delegation through max_spawn_depth. Teknium's Monday replies suggest the implementation is being opened further, while still relying on explicit configuration rather than pretending fan-out is free.

The update also sits next to a second, related effort: dynamic workflows. In a reply, Teknium pointed users to a Hermes Agent pull request for a dynamic-workflow skill. That PR, opened June 2, frames the work as a way to move the plan, loop, and intermediate state out of the context window and into code, while keeping LLM judgment in delegate_task batches. The PR explicitly separates deterministic fan-out through execute_code from LLM-judgment fan-out through delegated subagents.

That distinction is the real product bet. A model that can call tools is useful. An agent that can preserve state, dispatch independent workers, verify their outputs, and keep the user in the loop starts to resemble a lightweight operating environment. Hermes Agent's GitHub repository already pitches the project around memory, skills, scheduled automations, messaging surfaces, terminal backends, and model-provider choice. The repository lists support for providers including Nous Portal, OpenRouter, NVIDIA NIM, Hugging Face, OpenAI-compatible endpoints, and others, and its README describes Hermes as able to spawn isolated subagents for parallel work.

Teknium's role is central to why this update carries weight inside the open-agent community. His personal site describes him as Nous Research's cofounder and head of post-training, focused on post-training, alignment, and generalist LLMs, and says he built and stewarded the Hermes model family after helping take Nous from a Discord research collective into a company. Hermes Agent is the product expression of that thesis: not merely a hosted chatbot, but an agent runtime that can live across terminal, desktop, dashboard, and messaging channels.

The timing also follows a fast release cadence. GitHub's releases page lists Hermes Agent v0.16.0, the "Surface Release," with a June 5, 2026 release date, bringing a native desktop app, web dashboard administration panel, and setup improvements. Ten days later, Teknium is pushing on the orchestration layer. The sequence is coherent: first make Hermes easier to reach, then make it better at doing parallel work once a user is inside it.

There is still a gap between asynchronous subagents and durable autonomous workflow. Teknium's own dynamic-workflow PR stresses that durability, resumability, and long-running graph execution are separate concerns from simply spawning more child agents. That is the right caution. Non-blocking delegation reduces friction in the chat loop; it does not automatically solve cost control, verification, stale outputs, or runaway nested execution.

But the update moves Hermes Agent in the direction serious agent users care about: less time watching a single tool call block the interface, more ability to dispatch bounded workstreams and keep the operator in command. For Nous, that is also a competitive positioning move. The open-source agent market is crowded with demos that can call tools. Hermes is trying to prove that the runtime layer - scheduling, memory, delegation, permissions, and human steering - is where useful agents will be won.

Why it matters

ELI5

Reader comments