title: “Tencent HunyuanWorld Voyager: Generating 3D World Exploration Videos from a Single Image” description: “Tencent Hunyuan team releases Voyager technology, capable of generating world-consistent 3D point cloud sequence videos from a single image and user-defined camera paths, supporting infinite world exploration and direct 3D reconstruction” tag: tencent, video date: 2025-09-05
Tencent HunyuanWorld Voyager: Generating 3D World Exploration Videos from a Single Image
The Tencent Hunyuan team recently released the HunyuanWorld-Voyager technology, an innovative video diffusion framework capable of generating world-consistent 3D point cloud sequences from a single image and user-defined camera paths. This technology provides new solutions for 3D scene generation and world exploration.
Technical Features
The core advantage of Voyager lies in its world-consistent video generation capability. Compared to existing methods, this technology has the following features:
End-to-End Scene Generation: Voyager can achieve end-to-end scene generation and reconstruction, maintaining intrinsic consistency between frames without additional 3D reconstruction processes.
Long-Distance World Exploration: Through efficient world caching and point cloud culling technology, combined with autoregressive inference and smooth video sampling, it supports iterative scene expansion while maintaining context-aware consistency.
Scalable Data Engine: Provides a video reconstruction pipeline that can automatically perform camera pose estimation and metric depth prediction, supporting large-scale, diverse training data curation without manual 3D annotation.
Technical Architecture
Voyager integrates three key components:
-
World-Consistent Video Diffusion: A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observations to ensure global consistency
-
Long-Distance World Exploration: An efficient world caching mechanism containing point cloud culling and autoregressive inference, supporting smooth video sampling for iterative scene expansion
-
Scalable Data Engine: A video reconstruction pipeline for automated camera pose estimation and metric depth prediction, supporting large-scale training data curation
Application Scenarios
This technology has broad application prospects in multiple fields:
- 3D World Generation: Creating explorable 3D scenes from a single image
- Video Game Development: Rapidly generating game scenes and virtual worlds
- Film Production: Providing 3D scene content for movies and animations
- Robotics Simulation: Providing virtual environments for robot training
- Virtual Reality: Creating immersive VR experience content
Performance
In the WorldScore benchmark test, Voyager performed excellently across multiple evaluation dimensions:
- Camera Control: 85.95 points
- Content Alignment: 68.92 points
- 3D Consistency: 81.56 points
- Subjective Quality: 71.09 points
The overall average score reached 77.62 points, ranking first among the compared methods.
Technical Advantages
Compared to traditional 3D generation methods, Voyager has the following advantages:
Avoiding Visual Hallucinations: Through depth information as spatial priors, it avoids visual hallucination issues that may arise from relying solely on RGB conditions
Direct 3D Reconstruction: Simultaneously generates aligned RGB and depth sequences, supporting direct 3D scene reconstruction without additional structure-from-motion or multi-view stereo matching steps
Infinite World Expansion: Supports camera trajectories of arbitrary length, capable of maintaining original spatial layouts while performing infinite world expansion
Related Links
This technology has been open-sourced on the Hugging Face platform. Researchers and developers can access it through the following:
- Project Page: https://3d-models.hunyuan.tencent.com/world/
- Hugging Face Model: https://huggingface.co/tencent/HunyuanWorld-Voyager
- GitHub Repository: https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager
- Technical Report: https://arxiv.org/abs/2506.04225