Hi, I’m experimenting fine-tuning Bart for summarization task.
I tried both “with attention_mask” and “without attention_mask”.
And it seems both worked.
Could someone teach when to use attention_mask and why?
Thanks for advance.
Hi, I’m experimenting fine-tuning Bart for summarization task.
I tried both “with attention_mask” and “without attention_mask”.
And it seems both worked.
Could someone teach when to use attention_mask and why?
Thanks for advance.
This should be of help Glossary — transformers 4.3.0 documentation