Nvidia

NVIDIA: Nemotron 3 Nano Omni (free)

nvidia/nemotron-3-nano-omni-30b-a3b-reasoning

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and audio inputs and produces text output, enabling agents to perceive and reason across modalities in a single inference loop. Built on a hybrid MoE Transformer-Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS), it delivers approximately 2× higher throughput and 2.5× lower compute for video reasoning versus separate vision + speech pipelines. It supports up to 300K context length and a 16,384 reasoning budget, with extended thinking enabled via reasoning.enabled on OpenRouter.

Context window: 256,000 tokens
Input: text, audio, image, video
Output: text
Pricing: $0/M input tokens, $0/M output tokens

View on OpenRouter. Model data sourced from OpenRouter.