I am using the generative model (T5, Flan, etc) for classification. I have three classes. So it is a 3-class classification problem. The class labels are: Not vivid, moderately vivid, highly vivid. The model predicts the class labels. But I need to get the probability of each class similar to BERT model. If I fine-tuned a BERT model, It is easy to get the probability of each class. We need to add a SoftMax layer to the last year which returns the logits for each class. But the performance of BERT is not good for my scenario and using a generative model like T5 or Flan has a good performance. But I don’t know how to get the probabilities for each class using these generative models which output the probability distribution over the vocab not over the classes.