Hermes Agent adds virtual models for multi-model AI routing

The lab claims its MoA setup beats Opus 4.8 and GPT 5.5 on an unreleased HermesBench, but the benchmark is not public yet.

By · Published

Why it matters

Nous is turning model orchestration into a product surface. If Hermes can make multi-model routing feel like choosing one model, the agent layer, not the model lab, owns the workflow.

Hermes Agent adds virtual models for multi-model AI routing — The lab claims its MoA setup beats Opus 4.8 and GPT 5.5 on an unreleased HermesBench, but the benchmark is not public yet.

Nous Research (@NousResearch) said Friday that Hermes Agent now exposes Mixture of Agents presets as virtual models, making a multi-model workflow selectable like any other model inside the agent.

The move, announced in a two-post thread on X, is a product-level attempt to turn a familiar research pattern into a default user interface: run several models first, feed their outputs to an aggregator, and let the aggregator produce the final answer and tool calls. Nous framed the release as a way to get capabilities beyond individual gated frontier models, claiming its MoA presets scored 8% higher than Opus 4.8 and 11% higher than GPT 5.5 on an upcoming HermesBench benchmark.

That benchmark is the unresolved part of the announcement. Nous said a full HermesBench leaderboard is forthcoming, but did not publish the leaderboard, task mix, evaluation method, sample size, or the exact MoA preset behind the comparison in the X thread. Until those details are public, the percentage gains are Nous' own claim, not an independently checkable result.

What is verifiable today is the implementation shape. In the Hermes Agent MoA documentation, Nous describes Mixture of Agents as a virtual model provider. Each named preset appears under the moa provider, and the preset can be selected through the same model picker surfaces used by the CLI, gateway, terminal UI, dashboard, and desktop app. In the desktop app, the model dropdown shows an MoA presets section, according to the docs.

The architecture matters because it keeps the multi-model step inside Hermes' agent loop rather than treating it as an external prompt hack. Nous says the aggregator is the acting model: it writes the assistant response and emits tool calls. Reference models run first and provide analysis for the aggregator to use. Hermes then treats the aggregator response as the real model response, executes any tool calls normally, and repeats the MoA process on the next model iteration after tool results are added.

That design is the commercial bet under the feature. Model access has become a distribution problem as much as a capability problem: the strongest models are not equally available to every builder, and even when they are available, they differ in cost, latency, tool-call behavior, and reliability. Nous is positioning Hermes Agent as a layer where users can compose models from multiple providers without rewriting their workflows around each vendor's interface.

Hermes Agent already leans into that neutral control-plane role. The project's GitHub README says Hermes can use Nous Portal, OpenRouter, NovitaAI, NVIDIA NIM, Hugging Face, OpenAI, or a user's own endpoint, with model switching handled through hermes model. The same README describes Hermes as a self-improving agent with memory, skill creation, scheduled automations, and messaging surfaces across Telegram, Discord, Slack, WhatsApp, Signal, and CLI. GitHub listed the repository at 204,000 stars and 36,500 forks when checked Friday.

The MoA feature extends that thesis from model choice to model composition. A user can define presets in config.yaml, through the dashboard, through desktop settings, or with hermes moa configure. The docs show a preset format that separates reference_models from an aggregator, including provider and model names for each. The key product detail is that the preset then shows up as a normal model name, not as a separate workflow users have to remember to run.

That simplicity cuts both ways. A virtual model can make ensemble reasoning feel like a single model call, but the underlying cost and latency still depend on how many reference models run, which providers they hit, and how often the agent loops after tool calls. Nous' docs acknowledge part of that tradeoff by saying reference models receive only conversation text, not the Hermes system prompt or tool-call transcript, so those calls stay cheaper and avoid strict-provider rejections.

For Nous Research, the timing is also pointed. Hermes Agent's public site now describes the software as open source under the MIT License and says paid Nous Portal tiers include credits for Hermes Agent, access to more than 300 models, and built-in tool use. MoA presets give Nous another reason for users to route work through Hermes and Portal: the value proposition is no longer just access to models, but orchestration across them.

The stronger claim - that a configured Hermes MoA preset can outperform individual frontier systems on HermesBench - still rests on a leaderboard Nous has not released. The product shift is available to inspect now. The benchmark case remains a company assertion until Nous publishes enough detail for outsiders to reproduce or challenge it.

Reader comments

Conversation for this story loads after sign-in.