What is the purpose of this fine-tuning?


I found 🤗 Transformers Notebooks — transformers 4.12.0.dev0 documentation and then Google Colab .

The notebook will create examples which have the same text in the input and the labels. What is the purpose of such a model? Is it training some autoencoder task? I would think a more interesting challenge would be: Given input sample of text, have the label be the continuation of the sample of text.

Thank you,


As mentioned in the notebooks, the task is causal language modeling at first, so predict the next word. They also explicitly say that:

First note that we duplicate the inputs for our labels. This is because the model of the :hugs: Transformers library apply the shifting to the right, so we don’t need to do it manually.

Which is why you see the same labels as the inputs.

Does the causal model make sure to switch the attention set around when doing training?

I am not sure what you mean by “switch the attention set”. It applies the attention mask to hide future tokens if it’s what you mean (otherwise you would see a perplexity of 0 or 1 at the end of training).