Hi, I’m using the BART large model trained on Gigaword for summarisation and was trying to calculate the perplexity of the output summary.
I’m doing the following since I’m using beam search:
model_checkpoint = 'a1noack/bart-large-gigaword' tokenizer = BartTokenizerFast.from_pretrained("a1noack/bart-large-gigaword") model = BartForConditionalGeneration.from_pretrained(model_checkpoint, return_dict=True) device = "cuda" if torch.cuda.is_available() else "cpu" test = load_dataset("gigaword", split='test[:20]') encodings = tokenizer(test['document'], return_tensors='pt', padding=True, truncation=True, max_length=1024).to(device) model = model.to(device) model.eval() number_beams = 8 result = model.generate(encodings['input_ids'], num_beams=number_beams, return_dict_in_generate=True, max_length=model.config.max_length, output_scores=True, output_attentions=True) log_sent =  for batch_num in range(0, result.scores.shape, number_beams): max_score = torch.tensor(-1*1e6, dtype=torch.float).to(device) for beam_num in range(number_beams): max_score = torch.max(torch.stack([torch.max(result.scores[-1][batch_num+beam_num]), max_score])) log_sent.append(max_score) print("Perplexity:", torch.exp((-1*(torch.stack(log_sent).sum()))/result.sequences.shape))
This is based on my understanding from the answer to this Showing individual token and corresponding score during beam search - #2 by monmanuela by patrickvonplaten and
Generation Probabilities: How to compute probabilities of output scores for GPT2.
I’m unsure if this is the right way to use the output of
scores. I’m new to HF and NLP. I haven’t been able to find a similar issue resolved on the forum so it would be great if someone could confirm if this is the right way to compute perplexity?