Skip to content
Help Build a Better ComfyUI Knowledge Base Become a Patron
NewsOmniSVG: Fudan University and StepFun Launch Unified Vector Graphics Generation Model

OmniSVG: Fudan University and StepFun Launch Unified Vector Graphics Generation Model

Fudan University and StepFun have jointly released OmniSVG, a unified Scalable Vector Graphics (SVG) generation model. This model can generate high-quality vector graphics ranging from simple icons to complex anime characters through various input methods including text, images, or character references.

Unlike traditional image generation models, OmniSVG produces infinitely scalable and fully editable SVG files, allowing designers to directly utilize the generated results for post-processing and modification, greatly enhancing the practicality of AI-generated graphics in professional design workflows.

OmniSVG Model Examples

Technical Innovations and Operating Principles

OmniSVG is built on the pre-trained vision-language model (VLM) Qwen-VL and addresses core challenges in vector graphics generation through innovative SVG tokenization methods. The model parametrizes SVG commands and coordinates into discrete tokens, decoupling structural logic from geometric details while maintaining the expressive capability of complex SVG structures.

OmniSVG Workflow

This design offers several key advantages:

  • Efficient Generation Process: Training speed improved by over 3 times compared to traditional methods
  • Long Context Support: Processes sequences of up to 30,000 tokens, supporting the generation of complex SVGs with rich details
  • Multimodal Input Compatibility: Supports various input methods including text descriptions, image references, or character references

Generation process demonstration:

Generation Process Demo

Multiple Generation Modes

OmniSVG supports multiple generation modes to meet the needs of different application scenarios:

Text-to-SVG Generation

Users can generate semantically appropriate vector graphics through natural language descriptions, such as “a cartoon cat sitting under a cherry blossom tree.”

Text-to-SVG Examples

Image-to-SVG Conversion

Automatically converts bitmaps (such as photos or hand-drawn sketches) into vector graphics composed of paths, preserving the visual features of the original image while gaining editability.

Image-to-SVG Examples

Character Reference SVG Generation

Generates vector graphics that maintain the same character features but with different poses or scenarios based on existing character images, which is particularly valuable for animation and game character design.

Character Reference Generation Examples

MMSVG-2M Dataset

To advance SVG generation technology, the research team has open-sourced the MMSVG-2M dataset, the first large-scale multimodal SVG dataset containing 2 million samples covering categories such as icons, illustrations, and character designs.

MMSVG-2M Dataset Visualization

Key features of the MMSVG-2M dataset include:

  • Rich Diversity: Spans from simple icons to complex character designs with a wide range of complexity
  • Multimodal Annotations: Each SVG comes with text descriptions and corresponding bitmap renderings
  • High-Quality Samples: Provides professional-level vector graphic design samples

Currently, the research team has open-sourced the MMSVG-Icon and MMSVG-Illustration subdatasets on the Hugging Face platform, with the MMSVG-Character dataset planned for release in the near future.

Application Potential and Limitations

Application Scenarios

  • Design Automation: Quickly generate brand icons and illustration materials, reducing manual drawing time
  • Dynamic Content Creation: Batch generate character action sequences in combination with animation tools
  • Cross-Platform Adaptation: Generated vector graphics can be scaled without loss, suitable for various resolutions from mobile devices to 4K displays

Current Limitations

  • Generation Speed: Complex samples require generating tens of thousands of tokens, resulting in longer inference times (e.g., 139 seconds to generate an anime character)
  • Style Generalization: Limited conversion effects for image inputs not in the training set style, requiring further integration of multi-style data

Open Source Plans and Resources

The research team has open-sourced the MMSVG-Icon and MMSVG-Illustration datasets and plans to open-source the model code and pre-trained weights in the near future. The open-sourcing of the OmniSVG project will provide a new technical paradigm for the SVG generation field, promoting the intelligent upgrade of design tools.

The release of OmniSVG marks an important advancement in vector graphics generation technology, bringing new possibilities to fields such as graphic design, UI/UX creation, and visual content production, while also providing a new direction for the integration of AI-generated content into professional design workflows.