Pad Token and attention mask. What is the difference?

Hi there.

I apologise if the answer to this is in the docs (if so I cannot find it) but I am training BigBird from scratch and I have noticed that the BigBirdConfig has an argument for a pad token (pad_token_id) and the model itself also has an argument for an attention mask on its forward pass (attention_mask).

If I supply the id of my pad token, will my model generate an attention mask automatically to mask the padded input? Or is it also necessary to provide the attention mask for each input sequence?