Hey, I want to fine-tune TFT5ForConditionalGeneration
. Very similar to this thread, but I want to use the teacher forcing (i.e. use the forward
, rather than generate
method).
One thing that crossed my mind is to inherit from TFT5ForConditionalGeneration
, override the forward with a forward exactly like the original (from the source code), but change the loss to my loss. Is that a good idea?
I will add more context below, but my question is how to do something else than CrossEntropyLoss
for the t5 model loss.
Context:
My task is- given tokenized song lyrics, classify the genres of that song (multilabel classification).
The output I want from t5 is a string containing all the genres predicted, followed by eos token, and than padding tokens to keep all samples with equal length (some songs has more than one genre). For example
'FUNK,POP,,,</s> <pad> <pad> <pad> <pad> <pad>',
When I used the default t5 loss (training via forward), I decreased the loss during training but the eval loss is bad and it predicts nonsense. I think the model can just learn where to place the paddings to get a loss improvement and that’s not what I want.
I’m not sure how to implement differential loss while ignore the pads, but for now I only ask how to change the loss in general.
I can add the code if you want, Thanks