This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.

Pipeline Type: Flux2KleinAutoBlocks

Description: Auto blocks that perform the text-to-image and image-conditioned generation using Flux2-Klein.

for image-conditioned generation, you need to provide image (list of PIL images).
for text-to-image generation, all you need to provide is prompt.

This pipeline uses a 4-block architecture that can be customized and extended.

Example Usage

[TODO]

Pipeline Architecture

This modular pipeline is composed of the following blocks:

text_encoder (Flux2KleinTextEncoderStep)
- Text Encoder step that generates text embeddings using Qwen3 to guide the image generation
vae_encoder (Flux2KleinAutoVaeEncoderStep)
- VAE encoder step that encodes the image inputs into their latent representations.
- img_conditioning: Flux2KleinVaeEncoderSequentialStep
  - VAE encoder step that preprocesses and encodes the image inputs into their latent representations.
denoise (Flux2KleinCoreDenoiseStep)
- Core denoise step that performs the denoising process for Flux2-Klein (distilled model).
- input: Flux2TextInputStep
  - This step:
- prepare_image_latents: Flux2PrepareImageLatentsStep
  - Step that prepares image latents and their position IDs for Flux2 image conditioning.
- prepare_latents: Flux2PrepareLatentsStep
  - Prepare latents step that prepares the initial noise latents for Flux2 text-to-image generation
- set_timesteps: Flux2SetTimestepsStep
  - Step that sets the scheduler's timesteps for Flux2 inference using empirical mu calculation
- prepare_rope_inputs: Flux2RoPEInputsStep
  - Step that prepares the 4D RoPE position IDs for Flux2 denoising. Should be placed after text encoder and latent preparation steps.
- denoise: Flux2KleinDenoiseStep
  - Denoise step that iteratively denoises the latents for Flux2.
- after_denoise: Flux2UnpackLatentsStep
  - Step that unpacks the latents from the denoising step
decode (Flux2DecodeStep)
- Step that decodes the denoised latents into images using Flux2 VAE with batch norm denormalization

Conditional Execution

This pipeline contains blocks that are selected at runtime based on inputs:

Trigger Inputs: image

Model Components

text_encoder (Qwen3ForCausalLM)
tokenizer (Qwen2Tokenizer)
image_processor (Flux2ImageProcessor)
vae (AutoencoderKLFlux2)
scheduler (FlowMatchEulerDiscreteScheduler)
transformer (Flux2Transformer2DModel)

Configuration Parameters

is_distilled (default: True)

Input/Output Specification

Inputs Optional:

prompt (Any): No description provided
max_sequence_length (int), default: 512: No description provided
text_encoder_out_layers (Tuple), default: (9, 18, 27): No description provided
image (Any): No description provided
height (Any): No description provided
width (Any): No description provided
generator (Any): No description provided
num_images_per_prompt (Any), default: 1: No description provided
image_latents (List): No description provided
latents (Optional): No description provided
num_inference_steps (Any), default: 50: No description provided
timesteps (Any): No description provided
sigmas (Any): No description provided
joint_attention_kwargs (Any): No description provided
output_type (Any), default: pil: No description provided

Outputs - `images` (`List`): The images from the decoding step.

Downloads last month: -