This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.

Pipeline Type: Flux2KleinAutoBlocks

Description: Auto blocks that perform the text-to-image and image-conditioned generation using Flux2-Klein.

  • for image-conditioned generation, you need to provide image (list of PIL images).
  • for text-to-image generation, all you need to provide is prompt.

This pipeline uses a 4-block architecture that can be customized and extended.

Example Usage

[TODO]

Pipeline Architecture

This modular pipeline is composed of the following blocks:

  1. text_encoder (Flux2KleinTextEncoderStep)
    • Text Encoder step that generates text embeddings using Qwen3 to guide the image generation
  2. vae_encoder (Flux2KleinAutoVaeEncoderStep)
    • VAE encoder step that encodes the image inputs into their latent representations.
    • img_conditioning: Flux2KleinVaeEncoderSequentialStep
      • VAE encoder step that preprocesses and encodes the image inputs into their latent representations.
  3. denoise (Flux2KleinCoreDenoiseStep)
    • Core denoise step that performs the denoising process for Flux2-Klein (distilled model).
    • input: Flux2TextInputStep
      • This step:
    • prepare_image_latents: Flux2PrepareImageLatentsStep
      • Step that prepares image latents and their position IDs for Flux2 image conditioning.
    • prepare_latents: Flux2PrepareLatentsStep
      • Prepare latents step that prepares the initial noise latents for Flux2 text-to-image generation
    • set_timesteps: Flux2SetTimestepsStep
      • Step that sets the scheduler's timesteps for Flux2 inference using empirical mu calculation
    • prepare_rope_inputs: Flux2RoPEInputsStep
      • Step that prepares the 4D RoPE position IDs for Flux2 denoising. Should be placed after text encoder and latent preparation steps.
    • denoise: Flux2KleinDenoiseStep
      • Denoise step that iteratively denoises the latents for Flux2.
    • after_denoise: Flux2UnpackLatentsStep
      • Step that unpacks the latents from the denoising step
  4. decode (Flux2DecodeStep)
    • Step that decodes the denoised latents into images using Flux2 VAE with batch norm denormalization

Conditional Execution

This pipeline contains blocks that are selected at runtime based on inputs:

  • Trigger Inputs: image

Model Components

  1. text_encoder (Qwen3ForCausalLM)
  2. tokenizer (Qwen2Tokenizer)
  3. image_processor (Flux2ImageProcessor)
  4. vae (AutoencoderKLFlux2)
  5. scheduler (FlowMatchEulerDiscreteScheduler)
  6. transformer (Flux2Transformer2DModel)

Configuration Parameters

is_distilled (default: True)

Input/Output Specification

Inputs Optional:

  • prompt (Any): No description provided
  • max_sequence_length (int), default: 512: No description provided
  • text_encoder_out_layers (Tuple), default: (9, 18, 27): No description provided
  • image (Any): No description provided
  • height (Any): No description provided
  • width (Any): No description provided
  • generator (Any): No description provided
  • num_images_per_prompt (Any), default: 1: No description provided
  • image_latents (List): No description provided
  • latents (Optional): No description provided
  • num_inference_steps (Any), default: 50: No description provided
  • timesteps (Any): No description provided
  • sigmas (Any): No description provided
  • joint_attention_kwargs (Any): No description provided
  • output_type (Any), default: pil: No description provided

Outputs - images (List): The images from the decoding step.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support