In Transformers library, why is only the parameter
attention_mask singular? Even
head_mask is singular too.
I am asking this because other parameters like
labels are plural in forward().
Is there any historical reason for that choice?
One hypothesis I think is that for one sample in a batch,
attention_mask corresponds to one and
input_ids corresponds to multiple values.
However, this does not explain why
labels is plural.