How to rerank fine-tuned DialoGPT outputs with DialogRPT using Transformers?

I am not satisfied with the responses that DialoGPT produces – for the most part, they seem pretty random and AI-ish to me. I fine-tuned the model with my dataset using Trainer but that did not help much – the responses are often just quotes from the dataset out of context. So I decided to try DialogRPT human-vs-rand and human-vs-machine.

The problem is I do not understand how to rerank DialoGPT reponses with DialogRPT using Transformers. Should I use DialogRPT during fine-tuning to compute loss? Or maybe it is possible to connect it as a LogitsProcessor? If yes, then how? As I understand, Transformers’ generate() method outputs scores for every token but DialogRPT outputs a single number. How can I modify the scores of a response then?

I am new to machine learning and this stuff is quite overwhelming for me; any help is very appreciated!