MiniMax M3 is the third-generation large language model from MiniMax, a Shanghai-based AI lab backed by Tencent, Alibaba, and miHoYo. Released on June 1, 2026, M3 introduces MiniMax Sparse Attention (MSA), a sub-quadratic attention architecture that supports up to 1M token context with only 1/20th per-token compute of the previous generation. M3 is a natively multimodal model supporting image and video input, and is also the first domestic model to combine frontier coding ability, million-token context, and native multimodality — and an open-source model to do so.
- MiniMax Sparse Attention (MSA): A two-stage GQA-based sparse attention architecture — a lightweight index branch selects relevant KV blocks, and a sparse branch computes attention only on those blocks. Uses a KV-outer-gather-Q operator design that reads each block once with sequential memory access, achieving 4x+ speedup over open-source Flash-Sparse-Attention and FlashMoBA
- 1M Context Window: At 1M token context, per-token compute is 1/20th of M2; prefill is 9x+ faster, decoding is 15x+ faster. MSA matches full-attention quality in most ablation experiments
- Native Multimodal: Trained with multimodal data from step 0 (not post-hoc adapter). Supports image and video input, plus Computer Use (desktop operation). Interleaved data (text/image alternating) proved more critical than previously assumed; training data scaled to ~100 trillion tokens
- Frontier Coding & Agent: Scored 59.0% on SWE-Bench Pro (surpassing GPT-5.5 and Gemini 3.1 Pro, approaching Opus 4.7). Trained with an interactive user simulator framework that models real multi-turn developer collaboration, not just single-turn code generation
- Agentic Coding: Long-session software engineering with multi-turn collaboration, requirement clarification, and iterative debugging — demonstrated by autonomously reproducing an ICLR 2025 paper (12 hours, 18 commits, 23 experiment charts)
- Long-Running Autonomous Tasks: Complex multi-day agent workflows — demonstrated by optimizing FP8 GEMM CUDA kernels over 24 hours (147 benchmark submissions, 1959 tool calls, improving GPU utilization from 7.6% to 71.3%)
- Multimodal Document Understanding: Processing documents with embedded images, charts, formulas, and tables in a single context window
- Computer Use: Desktop automation across applications, files, and systems (e.g., reading Excel data and entering it into an ERP client)
| Capability | Description |
|---|
| Reasoning | Frontier-tier; 0.37 on PostTrainBench (close to GPT-5.5's 0.39 and Opus 4.7's 0.42) |
| Coding | SWE-Bench Pro 59.0%, TerminalBench 2.1 66.0%, SWE-fficiency 34.8%, KernelBench Hard 28.8%, MCP Atlas 74.2% |
| Multimodal | Native image + video input, text output; surpassed Gemini 3.1 Pro on OmniDocBench |
| Agent | Top score on Claw-Eval; surpassed Opus 4.7 on SVG-Bench |
| Response Speed | 9x+ prefill speedup, 15x+ decoding speedup vs. M2 at 1M context; supports thinking and non-thinking modes |
| Context Window | 1,000,000 tokens (1M) |
| Max Output | — (not yet disclosed) |
| Tool Use | Function calling, Computer Use, multi-agent orchestration via MiniMax Code |
| Computer Use | Desktop operation across applications |
- Exact total/activated parameter count not yet disclosed (technical report expected within 10 days of launch)
- Context >512K tokens currently available on a limited basis (full availability in days)## Pricing
| Model | Input (Credits/Token) | Cache Write (Credits/Token) | Cache Read (Credits/Token) | Output (Credits/Token) |
|---|
| MiniMax M3 | 0.30 | 0.30 | 0.06 | 1.20 |