Hello,
I am using t5-base
to map phrases into categories, for example: “I want to eat” → “hunger”. I have hundreds of different categories, and each category may have 1-3 phrases. Therefore, this is not a classical multi-class classification task. I rather call it a text-to-text mapping task.
Is there any way to get the probability for result
values returned for a phrase (see code snippet below)?
For example, if the phrase is “He is hungry”, the model returns top 5 most relevant labels. These results seem to be ordered by some relevance rank, so that the most relevant label is always first in outputs
. So, my question is how can I retrieve these probabilities?
My final goal is to set a threshold on the probability, so that outputs
would only include results that pass this threshold. If the threshold is not passed, then it should mean that nothing relevant found.
t5_tokenizer = T5Tokenizer.from_pretrained('t5-base')
t5_model = T5ForConditionalGeneration.from_pretrained('t5-base')
...
model.model.eval()
outputs = model.model.generate(
input_ids=test_input_ids,attention_mask=test_attention_mask,
max_length=64,
early_stopping=True,
num_beams=10,
num_return_sequences=5,
no_repeat_ngram_size=2
)
for output in outputs:
result = t5_tokenizer.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=True)
print(result)