Skip to content

GLM-4.5 Model Documentation

Overview

GLM-4.5 is a Mixture-of-Experts (MoE) large language model designed to address complex problem-solving, enhance reasoning capabilities, and improve performance across various tasks, including coding and logical deduction. It aims to unify agentic abilities while supporting flexible training paradigms and data generation strategies.

Key Variants

  • GLM-4.5
  • GLM-4.5-Air
  • GLM-4.5-Base
  • DeepSeek-R1-0528
  • Kimi K2

Problem Statement

GLM-4.5 addresses several limitations of existing models:

  • Lack of a single powerful open-source model that excels across diverse tasks, including reasoning and coding.
  • Inefficiencies in training due to underutilization of GPU resources during synchronous training.
  • Challenges in maintaining long-context capabilities during multi-stage reinforcement learning (RL).

Key Contributions

  • Hybrid Reasoning Modes: Introduces modes that enhance performance on complex reasoning and agentic tasks.
  • Grouped-Query Attention: Utilizes partial RoPE (Rotary Positional Encoding) to improve attention mechanisms.
  • QK-Norm: Stabilizes attention logits for better performance.
  • MoE Layer: Supports speculative decoding during inference, enhancing output quality.
  • Dynamic Sampling Temperature: Adjusts temperature based on reward stabilization to control output diversity.
  • Iterative Self-Distillation: Improves model performance iteratively by using responses from RL-trained models.

Training and Feedback Mechanisms

Task Scope

GLM-4.5 is capable of handling a wide range of tasks:

  • Mathematics
  • Code Generation
  • Scientific Reasoning
  • Web Search Tasks
  • Safety Evaluation

Feedback Types

  • Dense, reliable rewards from verifiable actions
  • Rule-based feedback
  • Human feedback (RLHF)
  • Model-based feedback (RLAIF)

Algorithm and Training Pipeline

GLM-4.5 employs a sophisticated training pipeline that includes:

  • Reinforcement Learning with Difficulty-Based Curriculum: Adapts training difficulty based on model proficiency.
  • Asynchronous RL Training: Decouples rollout and training engines to enhance efficiency.
  • Two-Stage Curriculum: Introduces progressively difficult problems to improve learning outcomes.

Hyperparameters

  • Total Parameters: 355 billion
  • Activated Parameters: 32 billion
  • Learning Rate: Warm-up from 0 to 2.5e-4, decay to 2.5e-5
  • Maximum Sequence Length: Extended from 4,096 to 32,768 during training

Evaluation and Performance Metrics

GLM-4.5 has been evaluated against various benchmarks, achieving notable results:

  • TAU-Bench: 70.1%
  • AIME 24: 91.0%
  • SWE-bench Verified: 64.2%
  • SafetyBench: 89.87%

Comparative Performance

  • Ranks 3rd overall among evaluated models.
  • Outperforms OpenAI's o3 on AIME 24 and SciCode.
  • Demonstrates superior performance on coding tasks compared to Claude Sonnet 4 and Kimi K2.

Techniques and Modules

MoE Architecture

  • Purpose: Enhances computational efficiency.
  • Implementation: Utilizes loss-free balance routing and sigmoid gates.

Dynamic Sampling Temperature

  • Purpose: Controls trajectory diversity during RL.
  • Implementation: Adjusts temperature based on reward signals.

Function Calling RL

  • Purpose: Improves capabilities in function calling.
  • Implementation: Combines step-wise rule-based RL with multi-turn RL.

Limitations and Areas for Improvement

  • Performance on specific benchmarks like BrowseComp is weaker compared to competitors.
  • Models trained with certain schedules may underperform on general benchmarks.
  • Ongoing challenges in addressing fairness and bias in model outputs.

Conclusion

GLM-4.5 represents a significant advancement in large language models, particularly in its ability to perform across a diverse set of tasks while incorporating innovative training and evaluation methodologies. Its architecture and training strategies position it as a leading choice for applications requiring robust reasoning and coding capabilities.

Sources

https://arxiv.org/abs/2508.06471v1