How to negatively penalize a T5 model's generation (Cross entropy doesn't do the job)?

I’m trying to fine-tune a T5 language model in such a way that it is explicitly penalized for generating slightly incorrect answers. But which loss function should I use?

I essentially want this loss function to output a large value if the probability assigned to that next “wrong” token is large.

("[SYS] Where do you want to go ? [USR] Hello I want to book a flight from Atlanta to New York [SEP] ", "location_from=atlanta [SEP] location_to=new york [EOS]")

But I also want to explicitly penalize it for wrong answers. So if I have a set of “negative training tuples” such as:

("[SYS] Where do you want to go ? [USR] Hello I want to book a flight from Atlanta to New York [SEP] ", "location_from=new york [SEP] location_to=atlanta [EOS]")