Skip to content
Help Build a Better ComfyUI Knowledge Base Become a Patron
NewsStarVector: A Multimodal Model for SVG Code Generation

StarVector: A New Tool for Generating SVG Code from Images and Text

Recently, an open-source project named StarVector announced its acceptance at the top computer vision conference CVPR 2025. This project offers a solution based on a multimodal large language model that can automatically generate SVG (Scalable Vector Graphics) code from images or text descriptions.

StarVector Main Visual

Online Experience

You can directly experience StarVector’s image-to-SVG functionality through the interactive demo below:

StarVector Online Demo

Through this demo, you can upload your own images, view the SVG generation effect in real-time, and obtain the generated SVG code.

What is StarVector?

StarVector is a multimodal vision-language model specifically designed for SVG generation tasks. It can achieve two main functions:

  1. Image-to-SVG: Converts bitmap images into SVG vector code.
  2. Text-to-SVG: Generates corresponding SVG graphics based on text descriptions.

Unlike traditional vectorization tools, StarVector does not simply perform curve fitting; it understands the semantic structure of images and uses appropriate SVG basic elements (such as circles, polygons, text, etc.) to generate more concise and precise vector graphic code.

Technical Architecture and Working Principle

StarVector adopts an innovative multimodal architecture that transforms image vectorization tasks into code generation tasks. The research team built the model architecture based on StarCoder (a code generation large model), enabling it to work directly in the SVG code space.

StarVector Architecture Diagram

When performing image-to-SVG conversion, the image is first projected into visual tokens by a visual encoder, and then the model generates the corresponding SVG code. For text-to-SVG generation, the model directly receives text instructions (without needing to provide an image) and creates entirely new SVG graphics.

StarVector’s training employs a two-stage method:

  • Pre-training Stage: Learning the mapping capability from images to SVG on the SVG-Stack dataset, which contains 2.1 million samples. This stage allows the model to learn the foundational ability to handle various vector graphic elements.
  • Fine-tuning Stage: Further optimization on specific domain datasets (such as SVG-Fonts, SVG-Icons, etc.) to enhance the model’s performance on specific tasks.

The project offers two model versions to meet different needs:

  • StarVector-1B: With 1 billion parameters, suitable for resource-limited environments, balancing performance and efficiency.
  • StarVector-8B: With 8 billion parameters, providing the highest generation quality, suitable for scenarios that pursue ultimate effects.

Key Technological Innovations

StarVector brings several key technological breakthroughs:

Semantic Understanding and Compact Expression

Traditional vectorization methods (such as AutoTrace, Potrace, etc.) mainly rely on curve fitting and lack an understanding of image semantics, often producing lengthy and hard-to-edit paths. StarVector can directly generate semantically relevant SVG primitives (such as <circle>, <text>, etc.) through multimodal analysis, resulting in more concise and editable code.

Innovative Evaluation Metrics

The project introduces evaluation metrics specifically for vector graphics, such as DinoScore, addressing the issue that traditional pixel-level metrics (like MSE) cannot accurately capture the topological structure of vector graphics, making the evaluation results closer to human visual perception.

Comparison with Existing Methods

In the SVG-Bench benchmark tests, the StarVector model (especially the 8B version) significantly outperforms existing vectorization methods:

StarVector Effect Comparison

The comparison chart shows the effects of different methods when processing various images. It is evident that the SVG code generated by StarVector is more concise and accurately captures the structure and semantics of the original image. The results are visually clearer, and the generated code is easier to edit and modify later.

Datasets and Benchmark Testing

To train and evaluate StarVector, the research team created two important resources:

SVG-Stack Dataset

A large-scale, diverse dataset containing 2.1 million samples, covering various vector graphics such as icons, charts, and fonts. This dataset enables the model to learn to handle various SVG primitives and achieve good generalization across different types of graphics.

SVG-Bench Evaluation Benchmark

A comprehensive evaluation benchmark containing 10 sub-datasets, covering three main tasks:

  • Image-to-SVG generation
  • Text-to-SVG generation
  • Chart generation

Each sub-dataset has different characteristics and difficulties, making the evaluation results more comprehensive and reliable.

Application Scenarios and Limitations

StarVector is particularly suitable for the following application scenarios:

  • Web and UI Design: Efficiently converting icons, buttons, and other interface elements.
  • Technical Charts and Flowcharts: Converting hand-drawn or raster charts into editable vector formats.
  • Font and Logo Design: Converting sketches or bitmap logos into precise vector versions.
  • Data Visualization: Providing clear, scalable vector representations for charts and graphics.

It is important to note that the current version of StarVector has certain limitations in the following areas:

  • It performs poorly on natural images (such as landscapes and portraits) because the training data does not include complex textures and lighting information.
  • It may overly simplify results for highly complex illustrations.
  • Processing very large images may require longer inference times.

Deployment and Usage

StarVector offers various deployment options to accommodate different usage scenarios:

HuggingFace API

Provides a ready-to-use model interface suitable for quick integration into existing projects. Users can convert images to SVG code through simple API calls.

VLLM Accelerated Backend

Optimizes inference speed through Paged Attention technology, supporting high-concurrency scenarios (such as batch image processing). This deployment method is particularly suitable for production environments that need to handle a large number of images.

Local Deployment and Demonstration

The project provides complete deployment guides and Gradio demonstration interfaces, allowing users to run the model in local environments and view results in real-time. The demonstration interface supports uploading images or entering text and visualizing the comparison of different model outputs.

Open Source Contributions and Future Development

The StarVector project has been fully open-sourced on GitHub (Apache 2.0 license), providing complete code, pre-trained models, and evaluation tools. The research team has also released the SVG-Stack and SVG-Bench datasets for training and evaluation, providing important resources for research in the field of vector graphic generation.

In the future, the research team plans to further improve StarVector in the following areas:

  • Enhance the ability to process natural images.
  • Provide more fine-grained control options, allowing users to specify specific parameters during the generation process.
  • Optimize model performance to reduce inference time and resource requirements.
  • Expand to more application scenarios, such as 3D model generation and dynamic SVG creation.