Microsoft Releases TRELLIS.2 - 4 Billion Parameter Image-to-3D Generation Model

Microsoft recently released TRELLIS.2, a large 3D generative model with 4 billion parameters, specifically designed for high-fidelity image-to-3D generation tasks. The model employs a novel sparse voxel structure called O-Voxel, capable of reconstructing and generating 3D assets with complex topologies, sharp features, and complete PBR materials.

TRELLIS.2 Example

Key Features

High Quality and Efficiency

TRELLIS.2 uses Sparse 3D VAE technology with 16× spatial downsampling to encode 3D assets into a compact latent space. The model can generate high-resolution fully textured assets with impressive speed:

512³ Resolution: Approximately 3 seconds (2s shape + 1s material)
1024³ Resolution: Approximately 17 seconds (10s shape + 7s material)
1536³ Resolution: Approximately 60 seconds (35s shape + 25s material)

These test results are based on NVIDIA H100 GPU.

Complex Topology Support

The O-Voxel representation method breaks through the limitations of traditional iso-surface fields and can robustly handle complex structures:

Open Surfaces: Such as clothing, leaves, etc.
Non-manifold Geometry: Complex geometric shapes
Internal Enclosed Structures: Models containing internal cavities

Rich Material Representation

Beyond basic color information, TRELLIS.2 can model various surface attributes including base color, roughness, metallic, and opacity, enabling photorealistic rendering of generated 3D assets with transparency support.

Fast Data Processing

The model’s data processing pipeline is optimized for instant conversion, completely free from rendering and optimization processes:

Textured Mesh to O-Voxel: Less than 10 seconds on a single CPU
O-Voxel to Textured Mesh: Less than 100 milliseconds with CUDA acceleration

Technical Implementation

TRELLIS.2 is built upon several specialized high-performance packages:

O-Voxel: Core library handling the conversion between textured meshes and O-Voxel representation
FlexGEMM: Efficient sparse convolution implementation based on Triton
CuMesh: CUDA-accelerated mesh processing utilities for post-processing, remeshing, simplification, and UV unwrapping

Model Availability

The pretrained TRELLIS.2-4B model is available on Hugging Face, supporting resolutions ranging from 512³ to 1536³. The model and code are released under the MIT license, making it accessible for researchers and developers.

The project code requires Linux systems and an NVIDIA GPU with at least 24GB of memory. The code has been verified on NVIDIA A100 and H100 GPUs.

Practical Applications

TRELLIS.2 is particularly suitable for scenarios requiring rapid generation of high-quality 3D assets, such as game development, virtual reality content creation, and product design visualization. The generated 3D assets include complete PBR material information and can be directly exported to GLB format for use in various 3D software and engines.

For non-technical users, the team also provides a web-based demo interface that allows direct image upload for 3D generation without writing code or configuring complex environments.

OpenMOSS Releases MOVA - Open-Source Synchronized Video and Audio Generation Model