In Transformers library, why is only the parameter attention_mask
singular? Even head_mask
is singular too.
I am asking this because other parameters like input_ids
, position_ids
and labels
are plural in forward().
Is there any historical reason for that choice?
One hypothesis I think is that for one sample in a batch, attention_mask
corresponds to one and input_ids
corresponds to multiple values.
However, this does not explain why labels
is plural.