Skip to content
Follow me on X
ComfyUI Wiki
NewsMoonshot AI Releases Kimi K2.5 - 1T Parameter Native Multimodal Agent Model

Moonshot AI Releases Kimi K2.5 - 1T Parameter Native Multimodal Agent Model

On January 27, 2026, Moonshot AI officially released and open-sourced the next-generation multimodal large model Kimi K2.5. As its most intelligent and versatile model to date, K2.5 features a native multimodal architecture design, supporting both visual and text input, thinking and non-thinking modes, dialogue and agent tasks, with leading performance in agent, coding, image, video, and general intelligence tasks.

Model Architecture

Native Multimodal Design

Kimi K2.5 is a 1T parameter Mixture-of-Experts (MoE) model with approximately 32B activated parameters. The model underwent continual pre-training on approximately 15 trillion mixed visual and text tokens, achieving true native multimodal capabilities.

The model employs a self-developed MoonViT vision encoder (400M parameters), seamlessly integrating visual and language understanding, supporting image and video input, with excellent performance in visual knowledge, cross-modal reasoning, and agent tool use grounded in visual inputs.

Agent Swarm Mechanism

K2.5 introduces an innovative Agent Swarm mechanism, transitioning from single-agent scaling to a self-directed, coordinated swarm-like execution scheme. The model can decompose complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents for more efficient task processing.

Core Capabilities

Visual Understanding and Code Generation

K2.5 demonstrates excellent visual understanding:

  • Image Understanding: MMMU-Pro score 78.5, CharXiv (RQ) score 77.5
  • Math Vision: MathVision score 84.2, MathVista (mini) score 90.1
  • OCR Capability: OCRBench score 92.3, OmniDocBench 1.5 score 88.8
  • Video Understanding: VideoMMMU score 86.6, VideoMME score 87.4

The model can generate code from visual specifications (UI designs, video workflows) and autonomously orchestrate tools for visual data processing.

Coding Capabilities

K2.5 performs excellently in programming tasks:

  • SWE-Bench Verified: 76.8% (surpassing Gemini 3 Pro)
  • SWE-Bench Multilingual: 73.0% (surpassing GPT 5.2 and Gemini 3 Pro)
  • LiveCodeBench (v6): 85.0%
  • Terminal Bench 2.0: 50.8%

Agent and Search Capabilities

K2.5 demonstrates powerful capabilities in agent and search tasks:

  • BrowseComp: Base score 60.6%, improved to 78.4% with Agent Swarm
  • WideSearch (item-f1): Base score 72.7%, improved to 79.0% with Agent Swarm
  • DeepSearchQA: 77.1%

Achieved best global open-source model results in multiple agent evaluations including HLE (Human’s Last Exam), BrowseComp, and DeepSearchQA.

Reasoning and Knowledge

  • HLE-Full: 30.1% (without tools), 50.2% (with tools)
  • AIME 2025: 96.1%
  • HMMT 2025 (Feb): 95.4%
  • GPQA-Diamond: 87.6%
  • MMLU-Pro: 87.1%

Technical Features

Dual Mode Support

K2.5 supports both:

  • Instant Mode: Quick response for daily conversations and simple tasks
  • Thinking Mode: Deep reasoning for complex problem-solving

Long Context Capability

  • Longbench v2: 61.0%
  • AA-LCR: 70.0%

The model can effectively process long text and long video content.

Application Scenarios

Kimi K2.5 is particularly suitable for:

  • Visual Programming: Generate code directly from UI design images or video demonstrations
  • Complex Task Automation: Parallel processing of multiple sub-tasks through Agent Swarm
  • Document Understanding: High-precision OCR and document analysis
  • Video Analysis: Long video content understanding and reasoning
  • Intelligent Search: Deep web search and information integration
  • Multimodal Dialogue: Intelligent conversations combining images and videos

Open Source and Availability

Kimi K2.5 is fully open-source, supporting both commercial and non-commercial use. Developers can:

  • Deploy and run locally
  • Fine-tune and customize
  • Integrate into various applications

Technical Breakthrough

Moonshot AI founder and CEO Zhilin Yang stated: “We rebuilt the reinforcement learning infrastructure and specifically optimized training algorithms to ensure it achieves ultimate efficiency and performance.”

The release of K2.5 marks an important milestone for multimodal agent models, integrating visual understanding, code generation, and agent collaboration capabilities into a single model, providing a powerful foundation for AI application development.