Qwen3 Text Encoder (Nunchaku INT4)

Language: English | 中文

tonera/Qwen3-text-Nunchaku is a quantized Qwen3 text encoder for the FLUX.2 klein family. It can be used as a drop-in replacement for the pipeline text_encoder.

Note: As of 2026-04-09, the Nunchaku PR for this functionality has still not been merged into the official main branch. If you want to try it early, you can pull and merge the code from nunchaku-ai/nunchaku#927.

Usage

import torch
from diffusers import Flux2KleinPipeline

from nunchaku import NunchakuQwenEncoderModel

text_encoder = NunchakuQwenEncoderModel.from_pretrained(
    "tonera/Qwen3-text-Nunchaku/svdq-int4-Qwen3-text-Nunchaku.safetensors",
    device="cuda",
    torch_dtype=torch.bfloat16,
)
pipeline = Flux2KleinPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein-9B",
    text_encoder=text_encoder,
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipeline(
    "A cat holding a sign that says hello world",
    num_inference_steps=4,
    guidance_scale=1.0,
).images[0]
image.save("flux2-klein-qwen3-text.png")

If you also use a Nunchaku-quantized Transformer, you can keep passing it through the usual transformer= argument of Flux2KleinPipeline.

Quantization quality

The following metrics are taken from hidden_states_last in data.json:

Metric Value
Cosine similarity 0.995844
Relative L2 0.091183
Max absolute error 6.539062
Downloads last month
820
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tonera/Qwen3-text-Nunchaku

Quantized
(18)
this model