I’m trying to fine-tune a T5 language model in such a way that it is explicitly penalized for generating slightly incorrect answers. But which loss function should I use?
I essentially want this loss function to output a large value if the probability assigned to that next “wrong” token is large.
("[SYS] Where do you want to go ? [USR] Hello I want to book a flight from Atlanta to New York [SEP] ", "location_from=atlanta [SEP] location_to=new york [EOS]")
But I also want to explicitly penalize it for wrong answers. So if I have a set of “negative training tuples” such as:
("[SYS] Where do you want to go ? [USR] Hello I want to book a flight from Atlanta to New York [SEP] ", "location_from=new york [SEP] location_to=atlanta [EOS]")