In Transformers library, why is only the parameter attention_mask singular? Even head_mask is singular too.
I am asking this because other parameters like input_ids, position_ids and labels are plural in forward().
Is there any historical reason for that choice?
One hypothesis I think is that for one sample in a batch, attention_mask corresponds to one and input_ids corresponds to multiple values.
However, this does not explain why labels is plural.