Transformer loss

Onlydrinkwater · March 17, 2022, 9:32pm

Hi!

I am now doing a bi-level training, which means that there are two models, one for generating data and the other one is trained on the genrated data.
In that case, my generated data are one-hot encoded, and I want to use that one-hot encoded label to train my second model.

However, the forward function in BART and T5 are like forward(input,...label,.... decoder_inputs_embeds....decoder_input...), and the ‘label’ only accept index instead of one-hot encoded label. Therefore, I manually embed the label and pass them to decoder_inputs_embeds. Is that the same as pass these one-hot encoded value to ‘label’?

Topic		Replies	Views
Using BART models encoder and decoder Models	1	628	November 22, 2022
Training BART, error when preparing decoder_input_ids. Shape of input_ids? Beginners	3	1454	August 7, 2020
Training Bart as a VAE for interpolation Models	0	672	August 1, 2022
Mismatch of tensor shapes in CrossEntropyLoss for custom head layer in BART Beginners	0	266	January 30, 2023
T5 fine tuning, loss difference when using labels and decoder_input_ids 🤗Transformers	2	1174	October 12, 2020

Transformer loss

Related topics