GLM-5.1

Overview

GLM-5.1 is an open-source flagship AI model developed by Z.ai (formerly Zhipu AI, a Tsinghua University spinoff and the first publicly traded foundation model company). Released on April 7, 2026, it is a post-training upgrade to GLM-5, built on a 754-billion parameter Mixture-of-Experts architecture with 40 billion active parameters per token. GLM-5.1 is purpose-built for agentic engineering and long-horizon autonomous software development, scoring 58.4% on SWE-Bench Pro (ranked #1 as of April 2026).

Key Features

Agentic Coding: SWE-Bench Pro 58.4% (ranked #1 as of April 2026), ahead of GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%).
8-Hour Autonomous Execution: Can work autonomously on a single task for up to 8 hours, completing full plan-execute-test-fix-optimize loops across hundreds of iterations and thousands of tool calls.
MIT Licensed Open Weights: Released under the MIT license on Hugging Face — one of the most permissive open-source licenses, allowing unrestricted commercial use, modification, and fine-tuning.
Ascend-Native Training: Trained entirely on Huawei Ascend 910B chips using the MindSpore framework, achieving full independence from US-manufactured hardware.

Best Use Cases

Long-Horizon Software Engineering: Excels at complex, multi-step coding tasks requiring sustained autonomous execution — e.g., building a complete Linux desktop system from scratch within 8 hours.
Agentic Tool Orchestration: Strong function calling, MCP integration, and structured output support make it ideal for building AI agents that need to interact with external tools and APIs.
Cost-Effective Frontier Performance: At $1.40/$4.40 per million input/output tokens, it delivers ~94.6% of Claude Opus 4.6's coding capability at roughly 5-8x lower cost.

Capabilities and Limitations

Capability	Description
Reasoning	AIME 2026: 95.3%, GPQA-Diamond: 86.2%, strong system-level reasoning across planning and iterative debugging
Coding	SWE-Bench Pro 58.4% (SOTA), CyberGym 68.7%, BrowseComp 68.0%, MCP-Atlas 71.8%
Multimodal	Text only. No image, audio, or video input (separate GLM-5V-Turbo variant available for vision)
Response Speed	Not independently benchmarked yet; comparable to similar-scale MoE models
Context Window	200K tokens
Max Output	128K tokens
Tool Use	Function calling, structured output, context caching, MCP integration, thinking mode
Multilingual	Strong multilingual support, particularly Chinese and English

Known Limitations

Text-only input; no native multimodal support (vision is handled by the separate GLM-5V-Turbo model).
Math and science scores (AIME 2026: 95.3%, GPQA-Diamond: 86.2%) trail top proprietary models, making it a weaker choice for pure quantitative research.
On broader coding composites (Terminal-Bench 2.0 + NL2Repo), Claude Opus 4.6 still leads at 57.5 vs GLM-5.1's 54.9.
Self-hosting requires significant compute resources due to the 754B parameter count.

Pricing

Model	Input (Credits/Token)	Cache Write (Credits/Token)	Cache Read (Credits/Token)	Output (Credits/Token)
GLM-5.1	1.40	1.40	0.26	4.40