Importance of sentinel token placement in T5?

rasgaard · May 16, 2023, 11:59am

Hi there!

There is this paper that I have been trying to reproduce (https://arxiv.org/pdf/2205.11482.pdf) as part of my master’s thesis. It uses T5 to learn facts from the training set where either the object or the subject is masked with a sentinel token. An example of a training sample (called abstracts) can be seen here:

Input: “Animal Farm is an allegorical and dystopian novella by <extra_id_0>, first published in England on 17 August 1945.”
Target: “<extra_id_0> George Orwell”

The entire dataset can be found here ekinakyurek/ftrace · Datasets at Hugging Face

The thing I’m wondering is that in the docs, the use of sentinel tokens are as specified:
Input: “The <extra_id_0> walks in <extra_id_1> park”
Target: “<extra_id_0> cute dog <extra_id_1> the <extra_id_2>”
i.e. a sort of inverse of each other’s masking.

You will notice that this is not the case for the example from the dataset that I’m working on. If I’m right the target should be “<extra_id_0> George Orwell <extra_id_1>” since the input mask is in the middle of the abstract.

It is far from the only case as you will see if you explore the dataset.
This has left me to wonder how this “not-so-perfect” placement and formatting of sentinel tokens might affect training of T5? Should it be considered a serious data-quality issue or does its implications sort of go away with training on a lot of data?

Thanks for reading through my question! Hope that someone will be able to clarify my doubts:)

Topic		Replies	Views
T5 generate() output doesn't produce <extra_id_0> 🤗Transformers	1	2238	July 18, 2022
Is "EOS token" mandatory for T5 model in text classification task Beginners	0	685	October 10, 2021
T5 masking - spans of text tokens or encoded tokens? Beginners	0	824	August 12, 2021
T5: ignore sentinel indices for unsupervised denoising / masking objective? Models	0	373	October 13, 2020
Question on HuggingFace's T5 documenation 🤗Transformers	0	319	May 18, 2023

Importance of sentinel token placement in T5?

Related topics