GLM-5.1

Overview

GLM-5.1 is an open-source flagship AI model developed by Z.ai (formerly Zhipu AI, a Tsinghua University spinoff and the first publicly traded foundation model company). Released on April 7, 2026, it is a post-training upgrade to GLM-5, built on a 754-billion parameter Mixture-of-Experts architecture with 40 billion active parameters per token. GLM-5.1 is purpose-built for agentic engineering and long-horizon autonomous software development, scoring 58.4% on SWE-Bench Pro (ranked #1 as of April 2026).

Key Features

  • Agentic Coding: SWE-Bench Pro 58.4% (ranked #1 as of April 2026), ahead of GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%).
  • 8-Hour Autonomous Execution: Can work autonomously on a single task for up to 8 hours, completing full plan-execute-test-fix-optimize loops across hundreds of iterations and thousands of tool calls.
  • MIT Licensed Open Weights: Released under the MIT license on Hugging Face — one of the most permissive open-source licenses, allowing unrestricted commercial use, modification, and fine-tuning.
  • Ascend-Native Training: Trained entirely on Huawei Ascend 910B chips using the MindSpore framework, achieving full independence from US-manufactured hardware.

Best Use Cases

  • Long-Horizon Software Engineering: Excels at complex, multi-step coding tasks requiring sustained autonomous execution — e.g., building a complete Linux desktop system from scratch within 8 hours.
  • Agentic Tool Orchestration: Strong function calling, MCP integration, and structured output support make it ideal for building AI agents that need to interact with external tools and APIs.
  • Cost-Effective Frontier Performance: At $1.40/$4.40 per million input/output tokens, it delivers ~94.6% of Claude Opus 4.6's coding capability at roughly 5-8x lower cost.

Capabilities and Limitations

CapabilityDescription
ReasoningAIME 2026: 95.3%, GPQA-Diamond: 86.2%, strong system-level reasoning across planning and iterative debugging
CodingSWE-Bench Pro 58.4% (SOTA), CyberGym 68.7%, BrowseComp 68.0%, MCP-Atlas 71.8%
MultimodalText only. No image, audio, or video input (separate GLM-5V-Turbo variant available for vision)
Response SpeedNot independently benchmarked yet; comparable to similar-scale MoE models
Context Window200K tokens
Max Output128K tokens
Tool UseFunction calling, structured output, context caching, MCP integration, thinking mode
MultilingualStrong multilingual support, particularly Chinese and English

Known Limitations

  • Text-only input; no native multimodal support (vision is handled by the separate GLM-5V-Turbo model).
  • Math and science scores (AIME 2026: 95.3%, GPQA-Diamond: 86.2%) trail top proprietary models, making it a weaker choice for pure quantitative research.
  • On broader coding composites (Terminal-Bench 2.0 + NL2Repo), Claude Opus 4.6 still leads at 57.5 vs GLM-5.1's 54.9.
  • Self-hosting requires significant compute resources due to the 754B parameter count.

Pricing

ModelInput (Credits/Token)Cache Write (Credits/Token)Cache Read (Credits/Token)Output (Credits/Token)
GLM-5.11.401.400.264.40