About the update of parameters for transformer

This may be a somewhat newbie question, but I haven’t seen it described specifically in many papers and tutorials.

In decoder, we usually use greedy decoding or beam search. in this way, does back propagation happen every time one word is output by decoder? I.e. update the parameters

If the ideal output would be: I LOVE YOU [End]

Does this mean that 4 back propagation will happen?

So if I use a classification head from the huggingface, will the decoder still do recursive operations? I have read the relevant source code, for example for XLM, and it seems that the output value of the model is just a tensor. (Before adding the linear classification head).