Higgs TTS 3: Boson AI's 4B Multilingual Speech Model with 100+ Language Support

news

Higgs TTS 3 is a 4B parameter text-to-speech model supporting 100+ languages with zero-shot voice cloning, expressive emotional control, and inline prosody/sound effects for voice agent applications.

Higgs TTS 3 is a 4B parameter multilingual text-to-speech model from Boson AI that supports over 100 languages with expressive speech generation, zero-shot voice cloning, and fine-grained control over emotion, prosody, and sound effects.

Overview

Released by Boson AI on June 4, 2026, Higgs TTS 3 (model ID: bosonai/higgs-tts-3-4b) is a powerful 4 billion parameter text-to-speech model designed specifically for voice agent and conversational AI applications. Unlike traditional TTS systems that merely "read" text, Higgs TTS 3 is built to "speak": generating expressive, natural conversational speech with emotional nuance.

The model is built on a Higgs multimodal architecture based on Qwen3, with an autoregressive decoder that consumes interleaved text and audio tokens. Audio is encoded by the Higgs Tokenizer into 8 codebooks at 25 fps using a staggered delay pattern, then decoded back to high-quality waveform.

Key Features

FeatureDescription
Parameters4 billion
Languages100+ (extensive multilingual coverage)
ArchitectureHiggs multimodal Qwen3-based autoregressive decoder
Voice CloningZero-shot voice cloning from reference audio
Control21 emotions, 10 prosody controls, inline sound effects
LicenseResearch and non-commercial
LibraryTransformers (Hugging Face)

Multilingual Support

Higgs TTS 3 supports over 100 languages including major language families:

  • European: English, Spanish, French, German, Italian, Portuguese, Russian, Polish, Dutch, Swedish, Norwegian, Danish, Finnish, Greek, Czech, Romanian, Hungarian, Ukrainian, and many more
  • Asian: Chinese (Mandarin/Cantonese), Japanese, Korean, Hindi, Bengali, Tamil, Telugu, Urdu, Vietnamese, Thai, Indonesian, Malay, Burmese, Khmer, Lao, and more
  • Middle Eastern / African: Arabic, Hebrew, Turkish, Persian (Farsi), Swahili, Amharic, Hausa, Yoruba, Igbo, Zulu, Xhosa, and more
  • Other: Tagalog, Nepali, Sinhala, Georgian, Armenian, Azerbaijani, Kazakh, Uzbek, and many more

Expressive Control

Higgs TTS 3 provides fine-grained control over speech output through inline control tags embedded in the input text:

21 Emotions (sentence-level)

affection, amusement, anger, arousal, awe, bitterness, confusion, contemplation, contentment, determination, disgust, elation, enthusiasm, fear, helplessness, longing, pride, relief, sadness, shame, surprise

Prosody Controls (10)

Speed control: speed_very_slow, speed_slow, speed_fast, speed_very_fast Pitch: pitch_low, pitch_high Expressiveness: expressive_more, expressive_less, pause, long_pause

Inline Sound Effects

Sound effects can be triggered inline: cough, laughter, sigh, applause, bell, knock, and many more.

Example Usage

<|emotion:elation|>Welcome aboard, we are absolutely thrilled to have you here!
<|sfx:cough|>Ahem, let me begin today's presentation.
<|style:whispering|>Come closer, I have a little secret to share.

Zero-Shot Voice Cloning

The model supports zero-shot voice cloning from a short reference audio sample, allowing it to synthesize speech in a target voice without any fine-tuning. This makes it suitable for:

  • Voice agent applications with consistent character voices
  • Multilingual content creation in a single voice
  • Personalized speech synthesis

Availability

Higgs TTS 3 is released for research and non-commercial use under a custom license. Prohibited uses include voice cloning without consent, impersonation, fraud, election deception, and biometric surveillance.

Summary

Higgs TTS 3 represents a significant advancement in open-weight multilingual speech synthesis, combining a 4B parameter backbone with extensive language coverage, expressive emotional control, and zero-shot voice cloning capabilities. For developers building voice agents or multilingual speech applications, it offers a compelling research-grade solution with state-of-the-art expressiveness.

Higgs TTS 3: Boson AI's 4B Multilingual Speech Model with 100+ Language Support | ComfyUI Wiki