Overwrite attention heads in BartForConditionalGeneration

matthewmcilwham · June 22, 2021, 1:05pm

Hi,

I am looking to overwrite the attention heads in the Bart model, following the below process:

Run the model on an article with the keyword parameter: “Covid”
Save the encoder/decoder heads for this article
Run the model on another article, also with the keyword parameter: “Covid”
As a proxy for making this model ‘topic-aware’, I will insert the “Covid” attention heads generated in step 2 and insert the attention heads for the model run in step 3
Model will generate a new ‘topic-aware’ summary for the article as the attention heads are ‘trained’ on the topic key-word ‘covid’

Note: The above is extremely preliminary, we will be looking to train the attention heads & model on more data for each key-word in the future.

article = """Covid-19 is a global pandemic"
model_name = "facebook/bart-large-cnn"
config = BartConfig.from_pretrained(model_name, output_hidden_states=True, output_attention=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer(article, padding=True, truncation=True, return_tensors="pt")
model = AutoModel.from_pretrained(model_name)
model.config.output_attentions = True
outputs = model(**inputs)
summary = tokenizer.decode(outputs)

covid_encoder_attention = outputs.encoder_attentions
covid_decoder_attention = outputs.decoder_attentions

# Repeat model run with new article and insert covid_encoder_attention and/or covid_decoder_attention for new run

tfburns · August 25, 2021, 11:16am

I’m curious to know how this is possible, also. I’ve found no methods in transformers to allow this.

Topic		Replies	Views
Train Bart for Conditional Generation (e.g. Summarization) Models	14	17152	November 22, 2023
Question regarding training of BartForConditionalGeneration Models	1	2025	March 2, 2021
Help with fine-tune BART for text infilling Beginners	2	2198	February 10, 2022
Don't Stop Pretraining BART Research	1	905	December 29, 2020
BART Paraphrasing Beginners	6	3077	February 18, 2022

Overwrite attention heads in BartForConditionalGeneration

Related topics