Transformer loss


I am now doing a bi-level training, which means that there are two models, one for generating data and the other one is trained on the genrated data.
In that case, my generated data are one-hot encoded, and I want to use that one-hot encoded label to train my second model.

However, the forward function in BART and T5 are like forward(input,...label,.... decoder_inputs_embeds....decoder_input...), and the ‘label’ only accept index instead of one-hot encoded label. Therefore, I manually embed the label and pass them to decoder_inputs_embeds. Is that the same as pass these one-hot encoded value to ‘label’?