Qwen3 Text Encoder (Nunchaku INT4)

Language: English | 中文

tonera/Qwen3-text-Nunchaku is a quantized Qwen3 text encoder for the FLUX.2 klein family. It can be used as a drop-in replacement for the pipeline text_encoder.

Note: As of 2026-04-09, the Nunchaku PR for this functionality has still not been merged into the official main branch. If you want to try it early, you can pull and merge the code from nunchaku-ai/nunchaku#927.

Usage

import torch
from diffusers import Flux2KleinPipeline

from nunchaku import NunchakuQwenEncoderModel

text_encoder = NunchakuQwenEncoderModel.from_pretrained(
    "tonera/Qwen3-text-Nunchaku/svdq-int4-Qwen3-text-Nunchaku.safetensors",
    device="cuda",
    torch_dtype=torch.bfloat16,
)
pipeline = Flux2KleinPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein-9B",
    text_encoder=text_encoder,
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipeline(
    "A cat holding a sign that says hello world",
    num_inference_steps=4,
    guidance_scale=1.0,
).images[0]
image.save("flux2-klein-qwen3-text.png")

If you also use a Nunchaku-quantized Transformer, you can keep passing it through the usual transformer= argument of Flux2KleinPipeline.

Quantization quality

The following metrics are taken from hidden_states_last in data.json:

Metric	Value
Cosine similarity	0.995844
Relative L2	0.091183
Max absolute error	6.539062

Downloads last month: 820

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tonera/Qwen3-text-Nunchaku

Base model

black-forest-labs/FLUX.2-klein-9B

Quantized

(18)

this model