Hi there,
I am implementing a seq2seq model. I am padding the target sequence using
DataCollatorForSeq2Seq(
tokenizer=t5_tokenizer,
padding=self.padding,
label_pad_token_id=-100,
return_tensors="pt",
)
However, when I decode the target sequence (as I want to compute BLUE score, for example) I get OverflowError: out of range integral type conversion attempted
because of the -100.
Is there a direct way to tokenizer.batch_decode
passing a custom padding token?
(the alternative would be to manually substitute -100 with 0 before decoding - as done here - but I am looking for something more straightforward, if it exists).
Thanks a lot in advance!