Looking for model to evaluate potential responses

I’m looking for a text classification or text generation model where we pass in an input string, and one or more candidate completions, and I just want the model to give me a score for each completion based on it’s appropriateness. For example, if we gave it the input string “What time is it?”, and a completion candidate “No, thanks.”, it would return a very low score because the candidate output is semantically a very inappropriate response. The responses “It’s later than your think”, or “It’s seven PM.” would hopefully get higher scores. Is there an existing model that can do this?

Yes, use BertForNextSentencePrediction