Outputting relevance scores

I am currently training a model to score messages for relevance. We generated hand-labelled data regarding how relevant a message is based on the message and its context. Now we’re thinking what would be the best way to fine-tune a model to output relevance scores.

1 - Is there any existing model out there that does relevance scoring out of the box? The best approach I could find without fine-tuning would be to extract the self-attention of an extractive summarization model to evaluate how relevant previous messages are.
2 - In order to fine-tune on relevance, I was thinking of taking T5, removing the LM head, and training it on a classification task, then outputting the logit or probability as the score. Does this approach make sense? We really care about rank rather than absolute score.