Given a seq2seq paraphrase model pp_model
, a tokenizer pp_tokenizer
, a piece of text
and a few pre-determined paraphrases pp_1, pp_2, pp_3
pp_model = AutoModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase")
pp_tokenizer = AutoTokenizer.from_pretrained("tuner007/pegasus_paraphrase")
text = "I like to go to the beach."
pp_1 = "I really enjoy going to the beach."
pp_2 = "The beach is somewhere I like to go."
pp_3 = "I like driving to the beach and watching the waves flow."
how can I calculate the generation probability of pp_model
generating each paraphrase?
For context, I need this to work out the KL-divergence between a model and a reference model, using the formula KL = E_{x \sim p_{model}} [\log p_{model}(x) - \log p_{refmodel}(x)] (e.g. as done here).