Xiaomi

Xiaomi: MiMo-V2.5

xiaomi/mimo-v2.5

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding tasks. Its 1M context window supports complete documents, extended conversations, and complex task contexts in a single pass, making it ideal for integration with agent frameworks where strong reasoning, rich perception, and cost efficiency all matter.

Context window: 1,048,576 tokens
Input: text, audio, image, video
Output: text
Pricing: $0.14/M input tokens, $0.28/M output tokens

View on OpenRouter. Model data sourced from OpenRouter.