Pass a custom mask when using RoBERTa

megamind · January 15, 2021, 2:07am

Suppose I have a sequence that consists of 2 sentences separated by </SEP> tokens like A </SEP> B. When performing forward pass with RoBERTa model, I want tokens in sentence A only attend to tokens in sentence A and vice versa for sentence B. The mask will be look like this:

In summary, is there any way to explicitly pass a custom attention mask to the model?
Thanks in advance.

s4sarath · January 19, 2021, 3:40pm

Attention mask is normally created from input_mask . You cannot bypass attention mask directly. I might be wrong also.

For your purpose, create an input_mask with 1s on First row rows and two colums and then 1s on last two rows and last two columns. Set else to 0.

sgugger · January 19, 2021, 9:22pm

Yes, all models take an attention_mask argument that you can customize.

megamind · January 20, 2021, 2:37am

From the documentation I see that model will accept 1D tensor (assume batch size equal 1) but what I need is 2D attention mask. How can I manage to do that? Thanks for your response.

attention_mask ( torch.FloatTensor of shape ((batch_size, sequence_length)) , optional) –

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1] :

1 for tokens that are not masked ,
0 for tokens that are masked .

voidful · May 26, 2021, 8:28am

you can change extended_attention_mask inside Roberta model, in order to make a 3D attention mask for your case.

I suggest that we can pass extended_attention_mask as a parameter.

AfonsoSousa · January 10, 2023, 5:24pm

I have a similar issue. Can you provide a minimal example on how to use a 3D attention mask per input sequence?

Topic		Replies	Views
Clarification on the attention_mask 🤗Transformers	4	23454	May 3, 2024
Sequence masking 🤗Transformers	0	379	April 25, 2022
Costumizing MASKed tokens 🤗Transformers	1	243	September 27, 2023
How to make a model predict on only some tokens Beginners	1	599	June 16, 2022
Training a model with custom attention masks in each layer 🤗Transformers	0	667	December 6, 2023

Pass a custom mask when using RoBERTa

Related topics