moonshotai

MoonshotAI: Kimi VL A3B Thinking

moonshotai/kimi-vl-a3b-thinking

Kimi-VL is a lightweight Mixture-of-Experts vision-language model that activates only 2.8B parameters per step while delivering strong performance on multimodal reasoning and long-context tasks. The Kimi-VL-A3B-Thinking variant, fine-tuned with chain-of-thought and reinforcement learning, excels in math and visual reasoning benchmarks like MathVision, MMMU, and MathVista, rivaling much larger models such as Qwen2.5-VL-7B and Gemma-3-12B. It supports 128K context and high-resolution input via its MoonViT encoder.

Context window: 131,072 tokens
Input: image, text
Output: text

View on OpenRouter. Model data sourced from OpenRouter.