Hi, I’m using the BART large model trained on Gigaword for summarisation and was trying to calculate the perplexity of the output summary.
I’m doing the following since I’m using beam search:
model_checkpoint = 'a1noack/bart-large-gigaword'
tokenizer = BartTokenizerFast.from_pretrained("a1noack/bart-large-gigaword")
model = BartForConditionalGeneration.from_pretrained(model_checkpoint, return_dict=True)
device = "cuda" if torch.cuda.is_available() else "cpu"
test = load_dataset("gigaword", split='test[:20]')
encodings = tokenizer(test['document'], return_tensors='pt', padding=True, truncation=True, max_length=1024).to(device)
model = model.to(device)
model.eval()
number_beams = 8
result = model.generate(encodings['input_ids'], num_beams=number_beams, return_dict_in_generate=True, max_length=model.config.max_length, output_scores=True, output_attentions=True)
log_sent = []
for batch_num in range(0, result.scores[0].shape[0], number_beams):
max_score = torch.tensor(-1*1e6, dtype=torch.float).to(device)
for beam_num in range(number_beams):
max_score = torch.max(torch.stack([torch.max(result.scores[-1][batch_num+beam_num]), max_score]))
log_sent.append(max_score)
print("Perplexity:", torch.exp((-1*(torch.stack(log_sent).sum()))/result.sequences.shape[1]))
This is based on my understanding from the answer to this Showing individual token and corresponding score during beam search - #2 by monmanuela by patrickvonplaten and
Generation Probabilities: How to compute probabilities of output scores for GPT2.
I’m unsure if this is the right way to use the output of scores
. I’m new to HF and NLP. I haven’t been able to find a similar issue resolved on the forum so it would be great if someone could confirm if this is the right way to compute perplexity?