This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
Pipeline Type: Flux2KleinAutoBlocks
Description: Auto blocks that perform the text-to-image and image-conditioned generation using Flux2-Klein.
- for image-conditioned generation, you need to provide
image(list of PIL images). - for text-to-image generation, all you need to provide is
prompt.
This pipeline uses a 4-block architecture that can be customized and extended.
Example Usage
[TODO]
Pipeline Architecture
This modular pipeline is composed of the following blocks:
- text_encoder (
Flux2KleinTextEncoderStep)- Text Encoder step that generates text embeddings using Qwen3 to guide the image generation
- vae_encoder (
Flux2KleinAutoVaeEncoderStep)- VAE encoder step that encodes the image inputs into their latent representations.
- img_conditioning:
Flux2KleinVaeEncoderSequentialStep- VAE encoder step that preprocesses and encodes the image inputs into their latent representations.
- denoise (
Flux2KleinCoreDenoiseStep)- Core denoise step that performs the denoising process for Flux2-Klein (distilled model).
- input:
Flux2TextInputStep- This step:
- prepare_image_latents:
Flux2PrepareImageLatentsStep- Step that prepares image latents and their position IDs for Flux2 image conditioning.
- prepare_latents:
Flux2PrepareLatentsStep- Prepare latents step that prepares the initial noise latents for Flux2 text-to-image generation
- set_timesteps:
Flux2SetTimestepsStep- Step that sets the scheduler's timesteps for Flux2 inference using empirical mu calculation
- prepare_rope_inputs:
Flux2RoPEInputsStep- Step that prepares the 4D RoPE position IDs for Flux2 denoising. Should be placed after text encoder and latent preparation steps.
- denoise:
Flux2KleinDenoiseStep- Denoise step that iteratively denoises the latents for Flux2.
- after_denoise:
Flux2UnpackLatentsStep- Step that unpacks the latents from the denoising step
- decode (
Flux2DecodeStep)- Step that decodes the denoised latents into images using Flux2 VAE with batch norm denormalization
Conditional Execution
This pipeline contains blocks that are selected at runtime based on inputs:
- Trigger Inputs:
image
Model Components
- text_encoder (
Qwen3ForCausalLM) - tokenizer (
Qwen2Tokenizer) - image_processor (
Flux2ImageProcessor) - vae (
AutoencoderKLFlux2) - scheduler (
FlowMatchEulerDiscreteScheduler) - transformer (
Flux2Transformer2DModel)
Configuration Parameters
is_distilled (default: True)
Input/Output Specification
Inputs Optional:
prompt(Any): No description providedmax_sequence_length(int), default:512: No description providedtext_encoder_out_layers(Tuple), default:(9, 18, 27): No description providedimage(Any): No description providedheight(Any): No description providedwidth(Any): No description providedgenerator(Any): No description providednum_images_per_prompt(Any), default:1: No description providedimage_latents(List): No description providedlatents(Optional): No description providednum_inference_steps(Any), default:50: No description providedtimesteps(Any): No description providedsigmas(Any): No description providedjoint_attention_kwargs(Any): No description providedoutput_type(Any), default:pil: No description provided
Outputs - images (List): The images from the decoding step.
- Downloads last month
- -