Text generation confidence


I’m working on fine-tuning seq2seq models and having a question about the validation of the generated output. As far as I understand, when using beam search for text generation, it is possible to get a sequences_scores metric from the generate() function as described here. Is it right that this can be treated as the confidence of the model in the generated output and taking torch.exp() from sequences_scores would give me some sort of understandable value from which I can make conclusions about the quality of results?