Likelyhood input sequence came from training set

I’m wondering if there’s a way of using a transformer to generate some sort of metric which scores an input sequence based on how similar it is to the training data. My motivation is I’ve created my own tokeniser and trained a RoBERTa model using a moderately large corpus of IoT device descriptions. The descriptions contain lots of abbreviations, unusual ways of delimiting the text etc.

When I pre-train, then fine tune a classifier the performance is good on some datasets and poor on others. I assume the variation is because some datasets aren’t similar enough to the training data.

So ideally I’d like to compete P(x1,…xn) where x1…xn is the input sequence, i.e. assuming this sequence is similar to data seen in training P(x1,…xn) should be higher than if not.

Given that the encoder produces a contextual embedding rather than probabilities I’m not sure if this is possible though?