Seeking Help with Attention Hooking in Dit-based T2I Models

tianyilt · August 19, 2024, 5:10am

In previous T2I models like Stable Diffusion 1.5, which were based on “self.unet,” we could use attention stores and attention processors to control the denoising of images by editing self- or cross-attention maps. For more details, refer to this resource.

Currently, I’m looking for assistance on how to hook into the attention mechanism in the following context:

noise_pred = self.transformer(
    latent_model_input, timestep=timesteps, class_labels=class_labels_input
).sample

Does anyone have insights on how to achieve this? I believe that a model based on a MMDIT architecture might benefit significantly from attention editing, potentially leading to smoother performance. Any guidance would be greatly appreciated!

CaptainZZZ · November 10, 2024, 2:48am

Hi, I am also looking for the cross attention map for DiT model, do you find solution now?

Topic	Replies	Views
Different masks for encoder self and cross attention 🤗Transformers	1099	November 8, 2022
How to add additonal attention layer in pretrained U-Net? Models	725	February 25, 2023
How to fine tune a self-supervised model? Beginners	340	January 26, 2023
Partial modification of the "instruct-pix2pix" model 🧨 Diffusers	23	September 2, 2024
How to plot an attention map for Vision Transformer model Beginners	2092	April 12, 2024

Seeking Help with Attention Hooking in Dit-based T2I Models

Related topics