stepfun

StepFun: Step 3.7 Flash

stepfun/step-3.7-flash

Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters per token. The model supports a 256K context window and exposes selectable reasoning levels (high/medium/low), letting callers trade off speed, cost, and depth of reasoning. Designed for coding, agentic workflows, structured outputs, and long-context productivity tasks.

  • Context window: 256,000 tokens
  • Input: text, image, video
  • Output: text
  • Pricing: $0.2/M input tokens, $1.15/M output tokens

View on OpenRouter. Model data sourced from OpenRouter.