Calculate the probability of a given sequence for a seq2seq model

tomroth1001 · April 22, 2022, 1:39am

Given a seq2seq paraphrase model pp_model, a tokenizer pp_tokenizer, a piece of text and a few pre-determined paraphrases pp_1, pp_2, pp_3

pp_model = AutoModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase")
pp_tokenizer = AutoTokenizer.from_pretrained("tuner007/pegasus_paraphrase")

text  = "I like to go to the beach."
pp_1  = "I really enjoy going to the beach."
pp_2  = "The beach is somewhere I like to go."
pp_3  = "I like driving to the beach and watching the waves flow."

how can I calculate the generation probability of pp_model generating each paraphrase?

For context, I need this to work out the KL-divergence between a model and a reference model, using the formula KL = E_{x \sim p_{model}} [\log p_{model}(x) - \log p_{refmodel}(x)] (e.g. as done here).

Topic		Replies	Views
Sentence Prediction Beginners	3	1070	March 3, 2022
[Announcement] Generation: Get probabilities for generated output 🤗Transformers	63	40486	January 20, 2025
Computing log probability of an arbitrary sequence given another sequence Beginners	1	2073	April 10, 2024
Using XLA fast text generation with Pegasus models Intermediate	5	570	August 25, 2022
BART summarization token probabilities Intermediate	0	903	October 8, 2021

Calculate the probability of a given sequence for a seq2seq model

Related topics