Rajit Khanna turns PrismVideos' Hermes rebuild into an agent API
After replacing a Vercel-based media agent with Hermes, PrismVideos is pitching hosted agent infrastructure for teams that would rather ship tools than memory.
By Ryan Merket ยท Published
Why it matters
PrismVideos is betting that agent memory, sandboxes, filesystems and automations are becoming infrastructure, not product differentiation. If that proves right, vertical AI teams will compete less on custom harness code and more on proprietary workflows, data integrations and learned user preferences.

Rajit Khanna is turning PrismVideos, an AI media creation product, into an agent-infrastructure provider after PrismVideos replaced its own custom media agent with a hosted Hermes setup and began pitching the same pattern as an API.
The argument in Khanna's post is direct: do not spend scarce engineering time rebuilding the generic parts of an agent. Bring a system prompt, tools, skills and connectors, then let a hosted Hermes instance handle memory, sessions, filesystem, automations and deployment. PrismVideos' proposed endpoint, POST /v1/deployments, takes a customer ID, model, sandbox settings, MCP servers, skills and secrets, then returns a deployment ID, thread ID, workspace path and events URL for server-sent events.
That makes the post less a conventional launch announcement than a confession about where the team's earlier agent work broke down. PrismVideos had shipped a media generation agent using Vercel AI Agents SDK. According to Khanna, that agent could recommend models, generate images and videos, and analyze videos so users could recreate them. Days later, Khanna wrote, Higgsfield launched Supercomputer, an agent with observational memory, skills, automations, a computer and a filesystem. Khanna says PrismVideos would have needed weeks to add those capabilities to its Vercel-based implementation.
The lesson PrismVideos drew is the one more AI application teams are running into: the user does not care whether memory, sandboxing and workflow automation are custom-built. The user notices when a competitor has them.
PrismVideos is selling the rebuild, not just the app
Khanna says PrismVideos deleted its existing agent and stood up an EC2 instance running a Hono server that creates a Hermes agent in a Docker container for each customer. The server acts as a reverse proxy between the PrismVideos app and the Hermes gateway, with user agents communicating over WebSocket, according to the post.
The media-specific work then moves to the edges. PrismVideos can define a system prompt, expose its own media tools through MCP, add skills for UGC video creation, storyboarding and visual effects, and connect services such as Meta Ads Manager, Google Drive and Resend. That is the part PrismVideos can plausibly own. The rest - memory, session compaction, filesystem persistence, automations, built-in tools and self-learning loops - is treated as runtime plumbing.
PrismVideos' core product already sits in a crowded application layer. The site describes an AI media studio that centralizes video, image and audio models including Google Veo, OpenAI Sora and GPT Image, xAI Grok Imagine, Kling, Seedance and Flux. It advertises apps for video generation, image editing, face swap, lipsync, virtual staging and visual effects, plus a Prism Agent that can orchestrate hundreds of AI tools. Those are PrismVideos' claims, not independently audited usage or revenue metrics.
The new API moves PrismVideos closer to the picks-and-shovels layer. The example request in Khanna's post uses runtime: "hermes", model: "anthropic/claude-sonnet-4.5", Docker sandboxing, a persistent filesystem, MCP tool definitions and feature flags for memory, dreaming, automations, steering and filesystem webhooks. The brand boundary is still loose: the post uses a PRISM_API_KEY while the sample response points to api.prismagents.com. Pricing, service-level commitments, beta status and customer count are not disclosed.
The competitive point is time-to-parity
Khanna's strongest point is not that Hermes is the final answer. It is that the harness layer is moving too quickly to be a durable moat for most vertical AI products. He describes Nous Research's Hermes as an open-source personal agent with 185,000-plus GitHub stars at the time of writing. The number is from Khanna's post, but the strategic point does not depend on the exact count: if a shared runtime gives teams memory, tools, automations and filesystem semantics out of the box, a product team that builds all of that itself starts each cycle behind.
Khanna also points to LangChain's Managed Deep Agents and Anthropic's writing on harness design for long-running agents as signs that the abstraction is converging. Application teams want to specify behavior and proprietary capabilities, not repeatedly implement session management, credential handling, browser access, sandboxing and long-running task orchestration.
That pattern is showing up outside media generation as well. RuntimeWire reported in May that Zerostack shipped a small Rust-native coding agent with multi-model support, a terminal UI, sandboxed bash and permissions. Different market, same pressure: agent builders are packaging the operational layer because end users increasingly expect ChatGPT- and Claude-like behaviors inside specialized software.
The unanswered question is who owns the agent runtime
PrismVideos' bet is founder-friendly in the practical sense: stop doing undifferentiated work, ship the customer-facing workflow. Khanna writes that AI agent products are more likely to create differentiated value by integrating with customers' proprietary data and learning their preferences than by creating the best harness for a particular use case.
That is also where the risk sits. If Hermes becomes the standard primitive, PrismVideos' API has a clear story. If OpenAI, Anthropic, LangChain, Vercel or another managed-runtime provider turns the same feature set into cheap infrastructure, PrismVideos must prove that its hosted Hermes implementation, media workflow expertise or developer experience is enough to matter beyond PrismVideos' own app.
For now, Khanna has put a name to a real tradeoff facing AI application teams. Every hour spent reproducing memory, skills, browser tools, automations and container isolation is an hour not spent encoding what the product knows about its users. PrismVideos is betting that more teams will choose the runtime shortcut, even if the winning runtime is still being decided.