I am using
t5-base to map phrases into categories, for example: “I want to eat” → “hunger”. I have hundreds of different categories, and each category may have 1-3 phrases. Therefore, this is not a classical multi-class classification task. I rather call it a text-to-text mapping task.
Is there any way to get the probability for
result values returned for a phrase (see code snippet below)?
For example, if the phrase is “He is hungry”, the model returns top 5 most relevant labels. These results seem to be ordered by some relevance rank, so that the most relevant label is always first in
outputs . So, my question is how can I retrieve these probabilities?
My final goal is to set a threshold on the probability, so that
outputs would only include results that pass this threshold. If the threshold is not passed, then it should mean that nothing relevant found.
t5_tokenizer = T5Tokenizer.from_pretrained('t5-base') t5_model = T5ForConditionalGeneration.from_pretrained('t5-base') ... model.model.eval() outputs = model.model.generate( input_ids=test_input_ids,attention_mask=test_attention_mask, max_length=64, early_stopping=True, num_beams=10, num_return_sequences=5, no_repeat_ngram_size=2 ) for output in outputs: result = t5_tokenizer.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=True) print(result)