How to add EOS when training T5?

tcz · October 21, 2024, 3:25am

Apparently the post-processor needs to be added. It’s automatically added to the official pretrained version.

from tokenizers.processors import TemplateProcessing

tokenizer._tokenizer.post_processor = TemplateProcessing(
    single="$A </s>",
    pair="$A </s> $B </s>",
    special_tokens=[("</s>", tokenizer.eos_token_id)]
)

inputs = tokenizer(['Hello world', 'Hello'], padding=True, truncation=True, max_length=100, return_tensors="pt")
labels = inputs["input_ids"]
print(labels)

Credit goes to arr10 on Stackoverflow.

Topic		Replies	Views
Is "EOS token" mandatory for T5 model in text classification task Beginners	0	691	October 10, 2021
T5 tokenizer's post-processor is suboptimal for truncated sequences for seq2seq finetuning 🤗Transformers	0	333	July 5, 2023
2 possible bugs for adding new tokens to T5 🤗Transformers	3	1326	June 25, 2024
Where to set T5 sep_token? 🤗Transformers	0	811	August 7, 2022
Add BOS and EOS when encoding a sentence 🤗Tokenizers	2	14864	August 22, 2022

How to add EOS when training T5?

Related topics