Is attention_mask needed for training Bart?

Hi, I’m experimenting fine-tuning Bart for summarization task.

I tried both “with attention_mask” and “without attention_mask”.
And it seems both worked.

Could someone teach when to use attention_mask and why?

Thanks for advance.

This should be of help Glossary — transformers 4.3.0 documentation

1 Like