Dynamic attention mask during GPT-2 training

xwwwww · December 11, 2020, 12:58am

My task is to generate a list of options and a story, given an intro.

an option is a sentence starting with a special token ‘<|option|>’.
an instance is something like “intro <|endofintro|> <|option|> option1 <|option|> option2 <|endofoption> story < endofstory>”

I need a dynamic attention mask because
(1) As I don’t want the following options to attention to the previous options, I should mask all the previous options. For example, I could set the attention mask on all the options pos to 0.
(2) However, as the story should attention to all the options, I should keep the attention mask on all the options pos to 1

I could simply implement that during generation by generating one option per time, and after generating options, I could modify the attention mask to generate the story.

But I don’t know how to do that during training.

Topic		Replies	Views
Is attention_mask in LanguageModels such as GPT2LMHeadModel related to attention mechanism is it just to specify padding tokens Beginners	2	207	June 27, 2024
GPT2 for QA Pair Generation Research	9	8608	March 23, 2022
Understanding attention output from generate method in GPT model Beginners	0	617	November 8, 2023
Wav2Vec2: Inner workings of the Trainer class Beginners	6	388	September 6, 2021
Giving attention mask to ppo_trainer Intermediate	0	236	May 4, 2024

Dynamic attention mask during GPT-2 training

Related topics