ComfyUI Subgraph Feature Now Officially Released

08/07/2025

CLIP Text Encode Hunyuan DiT

Overview of CLIP Text Encode Hunyuan DiT ComfyUI Node

The main functions of the CLIPTextEncodeHunyuanDiT node are:

Tokenization: Converting input text into token sequences that can be processed by the model.
Encoding: Using the CLIP model to encode token sequences into conditional encodings.

This node can be viewed as a “language translator” that converts user input text (whether English or other languages) into “machine language” that AI models can understand, enabling the model to generate corresponding content based on these conditions.

Class Name

Class Name: CLIPTextEncodeHunyuanDiT
Category: advanced/conditioning
Output Node: False

CLIP Text Encode Hunyuan DiT Input Types

Parameter	Comfy Data Type	Description
`clip`	`CLIP`	A CLIP model instance for text tokenization and encoding, core to generating conditions.
`bert`	`STRING`	Text input for encoding, supports multiline and dynamic prompts.
`mt5xl`	`STRING`	Another text input for encoding, supports multiline and dynamic prompts (multilingual).

bert parameter: Suitable for English text input. It’s recommended to input concise text with context to help the node generate more accurate and meaningful token representations.
mt5xl parameter: Suitable for multilingual text input. You can input text in any language to help the model understand multilingual tasks.

CLIP Text Encode Hunyuan DiT Output Types

Parameter	Comfy Data Type	Description
`conditioning`	`CONDITIONING`	Encoded conditional output for further processing in generation tasks.

Methods

Encoding Method: encode

This method accepts clip, bert, and mt5xl as parameters. First, it tokenizes bert, then tokenizes mt5xl, and stores the results in a tokens dictionary. Finally, it uses the clip.encode_from_tokens_scheduled method to encode the tokenized tokens into conditions.

Usage Examples

To be updated

Extended Content for CLIP Text Encode Hunyuan DiT Node

BERT (Bidirectional Encoder Representations from Transformers)

BERT is a bidirectional language representation model based on the Transformer architecture.

It learns rich contextual information through pre-training on large amounts of text data, then fine-tunes on downstream tasks to achieve high performance.

Key Features:

Bidirectionality: BERT considers both left and right context information simultaneously, enabling better understanding of word meanings.
Pre-training and Fine-tuning: Through pre-training tasks (like Masked Language Model and Next Sentence Prediction), BERT can be quickly fine-tuned for various downstream tasks.

Application Scenarios:

Text Classification
Named Entity Recognition
Question Answering Systems

mT5-XL (Multilingual Text-to-Text Transfer Transformer)

mT5-XL is the multilingual version of the T5 model, using an encoder-decoder architecture that supports processing multiple languages.

It unifies all NLP tasks as text-to-text transformations, capable of handling various tasks including translation, summarization, and question answering.

Key Features:

Multilingual Support: mT5-XL supports processing of up to 101 languages.
Unified Task Representation: Converting all tasks into text-to-text format, simplifying the task processing pipeline.

Application Scenarios:

Machine Translation
Text Summarization
Question Answering Systems

BERT and mT5-XL Research Papers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Description: This foundational paper introduces BERT, a transformer-based model that achieves state-of-the-art results on a wide array of NLP tasks.
mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
- Description: This paper presents mT5, a multilingual variant of T5, trained on a new Common Crawl-based dataset covering 101 languages.
mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences
- Description: This work develops mLongT5, a multilingual model designed to handle longer input sequences efficiently.
Bridging Linguistic Barriers: Inside Google’s mT5 Multilingual Technology
- Description: An article discussing the capabilities and applications of Google’s mT5 model in multilingual NLP tasks.
BERT-related Papers
- Description: A curated list of research papers related to BERT, including surveys, downstream tasks, and modifications.

Source Code

ComfyUI version: v0.3.10
2025-01-07

class CLIPTextEncodeHunyuanDiT:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": {
            "clip": ("CLIP", ),
            "bert": ("STRING", {"multiline": True, "dynamicPrompts": True}),
            "mt5xl": ("STRING", {"multiline": True, "dynamicPrompts": True}),
            }}
    RETURN_TYPES = ("CONDITIONING",)
    FUNCTION = "encode"
 
    CATEGORY = "advanced/conditioning"
 
    def encode(self, clip, bert, mt5xl):
        tokens = clip.tokenize(bert)
        tokens["mt5xl"] = clip.tokenize(mt5xl)["mt5xl"]
 
        return (clip.encode_from_tokens_scheduled(tokens), )

CLIP Text Encode SDXL Refiner Conditioning Set Timestep Range

RunningHub

RunComfy

Comfy Deploy

Comfy Online

Comfy.ICU

InstaSD

优云智算