T5: ignore sentinel indices for unsupervised denoising / masking objective?

hoyle · October 13, 2020, 10:55pm

The docs state that the masked language modeling objective is simply

input_ids = tokenizer.encode('The <extra_id_0> walks in <extra_id_1> park', return_tensors='pt')
labels = tokenizer.encode('<extra_id_0> cute dog <extra_id_1> the <extra_id_2> </s>', return_tensors='pt')
model(input_ids=input_ids, labels=labels)

I was wondering if I need to manually set the additional_special_tokens_ids (corresponding to the <extra_id_#> sentinels) in the labels to -100 during training so that they are ignored by the loss? It seems that at least the pad_token_id is changed in examples/seq2seq, but it’s not clear if this is true for the sentinels as well.

Topic		Replies	Views
How to denoise text using T5? 🤗Transformers	2	680	May 8, 2023
T5 decoder predicting tokens even after hitting end of sequence token, i.e </s> 🤗Transformers	4	326	February 26, 2024
Is T5 expected to ignore padding tokens in `decoder_input_ids` when `decoder_attention_mask` is not provided 🤗Transformers	4	2685	April 5, 2023
How to make a model predict on only some tokens Beginners	1	599	June 16, 2022
T5 models: About the decoder_input_ids argument Models	0	758	December 5, 2022

T5: ignore sentinel indices for unsupervised denoising / masking objective?

Related topics