Finetuned EncoderDecoder (RoBERTa): How to score decoder output confidence/context?


I have a large dataset where I am mapping input sentences to pre-defined responses for the purposes of automating the process. The inputs are mental health statements about how a user is feeling and the outputs are therapist responses offering guidance and support. I limit the input and output to specific lengths, and I can’t be sure of the dataset consistency since there are multiple therapists and many multiple individual inputs. The therapists all ideally respond similarly so that shouldn’t be an issue.

It works fairly well, but the problem is, it still occasionally responds with statements that don’t make sense given the input. For example:

Input: “Morning: I woke up feeling a little blue today but then took a nice walk and had some breakfast and did breathing exercises, and things are starting to look up.”
Output: “Good for you on getting that walk in, fresh air and blood movement can be a great way to get your body aligned to start the day, especially when paired with breathing exercises. Sometimes it helps to eat breakfast with your meds before a walk.”

but sometimes for that input, the model will say something like:

Output: “It is tough waking up with the blues, but next time you can try taking a walk or using your breathing exercises to help you center yourself, you got this!”

Clearly this doesn’t make much sense as it is suggesting something the input indicated was done already.

For background: I’m using Roberta for both the encoder and decoder, and I have found that not tying the weights (not sharing) has worked better. Batch size of 96 and 4 epochs.

I have tried using GPT-2 as a perplexity filter on the outputs but frankly, the GPT-2 perplexity is all over the place for my data and doesn’t align with what a correct response should be so I haven’t had much luck (Perhaps if I fine tuned GPT-2 on my output dataset the perplexity would be better.)

That said, the model no longer spits out gibberish (was a problem I used to have) which is awesome, but I can’t figure out if there is a way to define confidence of the output based on the input, or if I am going to have to figure out if there is a way to build a secondary model that “checks” the output embeddings against the input embeddings to see if the answer is “correct” with the intent being that I can automate knowing when to use or not use the response, or generate another, etc.

Was curious if anybody else had ever gone through this type of exercise before and what they found was successful?

Thanks in advance.