When we don’t pass
BartModel, the model automatically creates decoder input masks with
I’ve noticed that the method inserts ‘0’ in mask positions corresponding to indices the model needs to attend, and
-inf in positions corresponding to indices to be ignored. Below is the link to aforementioned code:
As far as I know attention masks should have 1 in indices we want to attend. Could anyone shed some light on this?