Perplexity for BART summaries

Namrata · June 11, 2021, 5:04pm

Hi, I’m using the BART large model trained on Gigaword for summarisation and was trying to calculate the perplexity of the output summary.

I’m doing the following since I’m using beam search:

    model_checkpoint = 'a1noack/bart-large-gigaword'
    tokenizer = BartTokenizerFast.from_pretrained("a1noack/bart-large-gigaword")
    model = BartForConditionalGeneration.from_pretrained(model_checkpoint, return_dict=True)
    device = "cuda" if torch.cuda.is_available() else "cpu"
    test = load_dataset("gigaword", split='test[:20]')
    encodings =  tokenizer(test['document'], return_tensors='pt', padding=True, truncation=True, max_length=1024).to(device)

    model = model.to(device)
    model.eval()
    number_beams = 8
    result = model.generate(encodings['input_ids'],  num_beams=number_beams, return_dict_in_generate=True, max_length=model.config.max_length, output_scores=True, output_attentions=True)
    
    log_sent = []

    for batch_num in range(0, result.scores[0].shape[0], number_beams):
        max_score = torch.tensor(-1*1e6, dtype=torch.float).to(device)
        for beam_num in range(number_beams):
            max_score = torch.max(torch.stack([torch.max(result.scores[-1][batch_num+beam_num]), max_score]))
        log_sent.append(max_score)
        
    print("Perplexity:", torch.exp((-1*(torch.stack(log_sent).sum()))/result.sequences.shape[1]))

This is based on my understanding from the answer to this Showing individual token and corresponding score during beam search - #2 by monmanuela by patrickvonplaten and
Generation Probabilities: How to compute probabilities of output scores for GPT2.

I’m unsure if this is the right way to use the output of scores. I’m new to HF and NLP. I haven’t been able to find a similar issue resolved on the forum so it would be great if someone could confirm if this is the right way to compute perplexity?

bivouac0 · February 11, 2022, 3:22pm

In case someone is still looking for a solution to this, here’s some sample code I did to get Bart perplexity scores. Bart Token Level Perplexity. Note that this is for masked language modeling, not summarization so it may need to be adapted for that specific task.

Topic		Replies	Views
Roberta vs Bart perplexity calcuation Models	3	1660	February 11, 2022
Finetuning BART for Abstractive Text Summarisation Beginners	1	4728	September 9, 2024
BART Paraphrasing Beginners	6	2988	February 18, 2022
How to parallel infer multiple input sentences with beam search = 4? 🤗Transformers	0	9	October 20, 2024
Inference/prediction ValueError using BART 🤗Transformers	0	307	April 17, 2022

Perplexity for BART summaries

Related Topics