Longformer's attention_mask

Inside of forward function of class LongformerSelfAttention(nn.Module):

The attention_mask is changed in BertModel.forward from 0, 1, 2 to
-ve: no attention
0: local attention
+ve: global attention

I wonder how this change happens.

  • Longformer inherits from RobertaModel, which inherits from BertModel. Would BertModel’s forward also be called when calling Longformer 's forward function?
  • How BertModel’s forward change 0, 1, 2 to -ve/0/+ve?

Thank you