Does attention_mask refer to input_ids or to labels?

Thanks, that’s a clear and succinct explanation!

But I guess my question can still stand regarding decoder_input_ids, in case it’s based on labels (see my other question, which would mean - if I understand correctly - that labels (shifted right) are used during computation, at decoder side, no?

1 Like