MiniMax M3

Overview

MiniMax M3 is the third-generation large language model from MiniMax, a Shanghai-based AI lab backed by Tencent, Alibaba, and miHoYo. Released on June 1, 2026, M3 introduces MiniMax Sparse Attention (MSA), a sub-quadratic attention architecture that supports up to 1M token context with only 1/20th per-token compute of the previous generation. M3 is a natively multimodal model supporting image and video input, and is also the first domestic model to combine frontier coding ability, million-token context, and native multimodality — and an open-source model to do so.

Key Features

  • MiniMax Sparse Attention (MSA): A two-stage GQA-based sparse attention architecture — a lightweight index branch selects relevant KV blocks, and a sparse branch computes attention only on those blocks. Uses a KV-outer-gather-Q operator design that reads each block once with sequential memory access, achieving 4x+ speedup over open-source Flash-Sparse-Attention and FlashMoBA
  • 1M Context Window: At 1M token context, per-token compute is 1/20th of M2; prefill is 9x+ faster, decoding is 15x+ faster. MSA matches full-attention quality in most ablation experiments
  • Native Multimodal: Trained with multimodal data from step 0 (not post-hoc adapter). Supports image and video input, plus Computer Use (desktop operation). Interleaved data (text/image alternating) proved more critical than previously assumed; training data scaled to ~100 trillion tokens
  • Frontier Coding & Agent: Scored 59.0% on SWE-Bench Pro (surpassing GPT-5.5 and Gemini 3.1 Pro, approaching Opus 4.7). Trained with an interactive user simulator framework that models real multi-turn developer collaboration, not just single-turn code generation

Best Use Cases

  • Agentic Coding: Long-session software engineering with multi-turn collaboration, requirement clarification, and iterative debugging — demonstrated by autonomously reproducing an ICLR 2025 paper (12 hours, 18 commits, 23 experiment charts)
  • Long-Running Autonomous Tasks: Complex multi-day agent workflows — demonstrated by optimizing FP8 GEMM CUDA kernels over 24 hours (147 benchmark submissions, 1959 tool calls, improving GPU utilization from 7.6% to 71.3%)
  • Multimodal Document Understanding: Processing documents with embedded images, charts, formulas, and tables in a single context window
  • Computer Use: Desktop automation across applications, files, and systems (e.g., reading Excel data and entering it into an ERP client)

Capabilities and Limitations

CapabilityDescription
ReasoningFrontier-tier; 0.37 on PostTrainBench (close to GPT-5.5's 0.39 and Opus 4.7's 0.42)
CodingSWE-Bench Pro 59.0%, TerminalBench 2.1 66.0%, SWE-fficiency 34.8%, KernelBench Hard 28.8%, MCP Atlas 74.2%
MultimodalNative image + video input, text output; surpassed Gemini 3.1 Pro on OmniDocBench
AgentTop score on Claw-Eval; surpassed Opus 4.7 on SVG-Bench
Response Speed9x+ prefill speedup, 15x+ decoding speedup vs. M2 at 1M context; supports thinking and non-thinking modes
Context Window1,000,000 tokens (1M)
Max Output— (not yet disclosed)
Tool UseFunction calling, Computer Use, multi-agent orchestration via MiniMax Code
Computer UseDesktop operation across applications

Known Limitations

  • Exact total/activated parameter count not yet disclosed (technical report expected within 10 days of launch)
  • Context >512K tokens currently available on a limited basis (full availability in days)## Pricing
ModelInput (Credits/Token)Cache Write (Credits/Token)Cache Read (Credits/Token)Output (Credits/Token)
MiniMax M30.300.300.061.20