How to add EOS when training T5?

Apparently the post-processor needs to be added. It’s automatically added to the official pretrained version.

from tokenizers.processors import TemplateProcessing

tokenizer._tokenizer.post_processor = TemplateProcessing(
    single="$A </s>",
    pair="$A </s> $B </s>",
    special_tokens=[("</s>", tokenizer.eos_token_id)]
)

inputs = tokenizer(['Hello world', 'Hello'], padding=True, truncation=True, max_length=100, return_tensors="pt")
labels = inputs["input_ids"]
print(labels)

Credit goes to arr10 on Stackoverflow.

1 Like