I have a query that since we use the cross entropy loss as main metric in both train and val to detemine the model performance in terms of overfitting underfitting and generalization right? then as we use the teacher forcing in the train then we compute the loss do we also use the teacher forcing not the autoregressive technique in the validation to get same logists equal to the ground truth tokens to compute the loss
just the difference is that we donot update the parmaters in this validation phase
and autoregressive is ONLY used in the test phase?
kindly some one can help me and also provide some referece if you may know
1 Like