How to decode with custom pad tokens

Hi there,

I am implementing a seq2seq model. I am padding the target sequence using

DataCollatorForSeq2Seq(
    tokenizer=t5_tokenizer,
    padding=self.padding,
    label_pad_token_id=-100,
    return_tensors="pt",
)

However, when I decode the target sequence (as I want to compute BLUE score, for example) I get OverflowError: out of range integral type conversion attempted because of the -100.

Is there a direct way to tokenizer.batch_decode passing a custom padding token?

(the alternative would be to manually substitute -100 with 0 before decoding - as done here - but I am looking for something more straightforward, if it exists).

Thanks a lot in advance!

3 Likes

Hey @pietrolesci
did you find any solution for this? I’m stuck with the same problem.

2 Likes

I am getting same issue, is there any solution?

Got same problem