Inside of forward function of class LongformerSelfAttention(nn.Module):
The
attention_mask
is changed inBertModel.forward
from 0, 1, 2 to
-ve: no attention
0: local attention
+ve: global attention
I wonder how this change happens.
- Longformer inherits from RobertaModel, which inherits from BertModel. Would BertModel’s forward also be called when calling Longformer 's forward function?
- How BertModel’s forward change 0, 1, 2 to -ve/0/+ve?
Thank you