Skip to content
Help Build a Better ComfyUI Knowledge Base Become a Patron
NewsNari Labs Releases Dia 1.6B Text-to-Dialogue Speech Model

Nari Labs Releases Dia 1.6B Text-to-Dialogue Speech Model

Dia Banner

Nari Labs has recently released an open-source text-to-speech (TTS) model called Dia. Unlike conventional TTS models, Dia is a 1.6B parameter model specifically designed for dialogue generation, capable of producing highly realistic multi-character conversations directly from text scripts.

Key Features

Dia model offers the following features:

  • Generates multi-character dialogues from a single text script
  • Enables emotion and tone control through audio conditioning
  • Produces non-verbal communications like laughter, coughing, throat clearing, and other natural vocal expressions
  • Provides open-source weights and inference code for research and applications

Currently, the Dia model only supports English speech generation.

Try It Online

You can experience the Dia model directly through the Hugging Face space below:

Nari Dia 1.6B Demo

How to Use

Using Dia to generate dialogues is straightforward. You simply need to format your dialogue text as follows:

  • Use [S1] and [S2] tags to distinguish different speakers
  • Place non-verbal expressions in parentheses, such as (laughs), (coughs), etc.
  • Voice cloning is possible by providing audio samples

The model generates different voices with each run, but you can maintain voice consistency by adding audio prompts or fixing the random seed.

Hardware Requirements

The Dia model currently only supports GPU inference (requires PyTorch 2.0+ and CUDA 12.6) and can achieve 2x real-time generation speed on an RTX 4090. The team plans to add CPU support and quantized versions in the future.

The Dia model was developed by Nari Labs, where “Nari” is the Korean word for lily. The team consists of one full-time and one part-time research engineer and received computational resources support from the Google TPU Research Cloud program.