Combining tokenizer.decode and model.generate scores for probability prediction

Shaike04 · February 28, 2023, 3:15pm

I want to use the input_id level scores provided by model.generate(return_scores=True) with the decode function of the Tokenizer.

The logit scores are input_id level and the decoder decodes to token level.
My desired output for a prompt of “hello, how are you” would be "doing (p=0.8) today (p=0.8 * 0.7) sir (p=0.8 * 0.7 * 0.7) for example.

How can I achieve this?

joaogante · March 1, 2023, 5:39pm

Hey @Shaike04

Have a look at our documentation here. I believe the first example does precisely what you want!

Shaike04 · March 1, 2023, 7:54pm

Thank you so much, I am looking into it.

Topic		Replies	Views
Get the top k token probabilities of t5 first token Beginners	0	430	July 24, 2021
Custom Decoding Strategy Beginners	0	468	December 6, 2023
Print All Tokens Over a Certain Probability Threshold Research	3	1120	July 21, 2020
Turn word embedding to word id (using T5 decoder) 🤗Transformers	0	339	January 8, 2022
Token classification probability and scoring 🤗Transformers	0	756	November 23, 2020

Combining tokenizer.decode and model.generate scores for probability prediction

Related topics