Adding cross-attention to custom models

sachin · April 10, 2021, 1:06pm

Hi, I was thinking of adding cross attention between a visual transformer and a bert model. Was wondering if there was a way that I could do this using the HF library.

What I was thinking was if somewhere in the HF Bert model API if I had access to where it took in the queries, keys, and values, I could subclass the BERT submodule and add cross attention instead of just having self attention. I’m visualizing something very much like this code snippet from the annotated transformer paper. Specifically the DecodeLayer class where there is self_attn as well an additional src_attn which I would need to add in.

I am also aware that I would need to copy the weights for everything but the src_attn module. Just need some mechanism to do so. Fingers crossed there is some place in the HF API that I can do exactly that.

Happy to do this myself if someone can point where in the HF library I should be looking at to see where it uses queries, keys and values arguments.

sachin · September 4, 2021, 12:04pm

bump. Sorry guys just wondering if anyone had any ideas about this.

sachin · October 21, 2022, 11:56am

Partial answer:

model1 = AutoModel.from_pretrained("gpt2")
gpt_config = model1.config
gpt_config.add_cross_attention = True
new_model = AutoModelForCausalLM.from_pretrained("gpt2", config=gpt_config)

Similarly for models like bert you need to do one additional step like this:

model1 = AutoModel.from_pretrained("bert-base-cased")
bert_config = model1.config
bert_config.add_cross_attention = True
bert_config.is_decoder = True
model2 = AutoModel.from_pretrained("bert-base-cased", config=bert_config)

Topic		Replies	Views
Custom modification on transformers 🤗Transformers	1	166	June 13, 2024
How to get cross-attention values of T5? 🤗Transformers	2	3854	October 9, 2020
VisionEncoderDecoder X-Attn Question 🤗Transformers	4	509	June 20, 2022
I want to use bert model weight to train a gpt model how is that possible 🤗Transformers	0	151	November 4, 2023
About the Cross-attention Layer Shape in Encoder-Decoder Model 🤗Transformers	1	1912	March 18, 2022

Adding cross-attention to custom models

Related topics