About the update of parameters for transformer

ChenC · November 3, 2022, 6:35pm

This may be a somewhat newbie question, but I haven’t seen it described specifically in many papers and tutorials.

In decoder, we usually use greedy decoding or beam search. in this way, does back propagation happen every time one word is output by decoder? I.e. update the parameters

If the ideal output would be: I LOVE YOU [End]

Does this mean that 4 back propagation will happen?

So if I use a classification head from the huggingface, will the decoder still do recursive operations? I have read the relevant source code, for example for XLM, and it seems that the output value of the model is just a tensor. (Before adding the linear classification head).

Topic		Replies	Views
Difference between transformer encoder and decoder Models	1	11890	March 12, 2021
Have specific examples in electra/BERT not back propagate 🤗Transformers	0	239	April 12, 2021
Rewriting generate function for manual decoder input 🤗Transformers	7	3585	July 11, 2022
Cache T5 encoder results within batch when training 🤗Transformers	0	489	March 6, 2021
T5 fine tuning, loss difference when using labels and decoder_input_ids 🤗Transformers	2	1189	October 12, 2020

About the update of parameters for transformer

Related topics