Suppose I have a sequence that consists of 2 sentences separated by </SEP> tokens like A </SEP> B. When performing forward pass with RoBERTa model, I want tokens in sentence A only attend to tokens in sentence A and vice versa for sentence B. The mask will be look like this:
In summary, is there any way to explicitly pass a custom attention mask to the model?
Thanks in advance.
Attention mask is normally created from input_mask . You cannot bypass attention mask directly. I might be wrong also.
For your purpose, create an input_mask with 1s on First row rows and two colums and then 1s on last two rows and last two columns. Set else to 0.
Yes, all models take an attention_mask
argument that you can customize.
From the documentation I see that model will accept 1D tensor (assume batch size equal 1) but what I need is 2D attention mask. How can I manage to do that? Thanks for your response.
attention_mask ( torch.FloatTensor
of shape ((batch_size, sequence_length))
, optional) –
Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
- 1 for tokens that are not masked ,
- 0 for tokens that are masked .
you can change extended_attention_mask inside Roberta model, in order to make a 3D attention mask for your case.
I suggest that we can pass extended_attention_mask as a parameter.
I have a similar issue. Can you provide a minimal example on how to use a 3D attention mask per input sequence?