Fine-tuning Pegasus

pcasas · June 2, 2021, 12:45am

Hi @valhalla ! links are not working now, can you point us to the new place? Thanks for your work!
In addition, to fine tune pegasus, is it needed two fields: context + summary, like xsum, or we can fine tune the ‘inner’ language model (like ULM-Fit), just to make the summaries more suitable for the desire domain knowledge.
Thanks again.

Aliisa · June 29, 2021, 7:13pm

Hi! Can you be more specific as to which examples you removed to improve this performance? I have seen these issues. Right now am seeing model instability where it outputs different summaries for the same input sentence. Would love to get this model working, thank you.

abhilashpal · July 10, 2021, 12:59pm

PS : If anyone is trying to train Pegasus using the Seq2SeqTrainer here is a bit of code I took from finetune_trainer.py and utils.py in the seq2seq folder in legacy. It works to freeze the embeddings for the model after you’ve loaded it.

from torch import nn 

def freeze_params(model: nn.Module):
    """Set requires_grad=False for each of model.parameters()"""
    for par in model.parameters():
        par.requires_grad = False

def freeze_embeds(model):
    """Freeze token embeddings and positional embeddings for bart, just token embeddings for t5."""
    model_type = model.config.model_type

    if model_type in ["t5", "mt5"]:
        freeze_params(model.shared)
        for d in [model.encoder, model.decoder]:
            freeze_params(d.embed_tokens)
    elif model_type == "fsmt":
        for d in [model.model.encoder, model.model.decoder]:
            freeze_params(d.embed_positions)
            freeze_params(d.embed_tokens)
    else:
        freeze_params(model.model.shared)
        for d in [model.model.encoder, model.model.decoder]:
            freeze_params(d.embed_positions)
            freeze_params(d.embed_tokens)
freeze_embeds(model)

ArnauC · October 5, 2021, 9:30am

Hi, maybe this a silly question but… Isn’t there an already pretrained on big_patent version of pegaus?

pikaduck · October 5, 2021, 9:56am

Yes, that model was released this year, we were working on this since 2020 and that’s when we needed help with it, so I posted about it here. The variant of pegasus fine-tuned on big-patent by the way, is pretty heavy. It does have a high inference time too!

ArnauC · October 5, 2021, 10:02am

Ok, thanks! I’m working with bigbird pegasus which works with longer sequences but does not use full n-square attention, so let’s see…

pikaduck · October 5, 2021, 10:20am

Check this application out http://summent-summarizer.herokuapp.com/

We built it on bigbirdpegasus!

ArnauC · October 13, 2021, 10:34pm

Cool!
One question, what GPU requirements did you use when training such a huge model?
With a 16GB GPU on Colab Pro Im running out of memory…

pikaduck · October 13, 2021, 11:21pm

We did not fine-tune bigbirdpegasus-bigpatent, it gave decent at best results for our use case. But if you’d like to fine-tune, it will take a large GPU sagemaker notebook on AWS or large GCP server. How many training examples are you using right now?

ArnauC · October 14, 2021, 6:58am

Not many, we are testing with some 100 examples, but we can curate more (maybe up to 1000 at most)

pikaduck · October 14, 2021, 7:28am

I would suggest if you’re fine-tuning with decent sized samples, you could try with 10 to begin with!

ArnauC · October 14, 2021, 7:33am

The problem is that we are working with maximum size sequences, around 4000 tokens. And even with batch_size = 1, my RAM (16 GB GPU) is not able to process it…

ArnauC · October 14, 2021, 7:34am

Since I am working with batch_size = 1, the number of samples should not matter, should it?

pikaduck · October 14, 2021, 7:58am

I think the model by itself performs decently well, so you could fine-tune it on 10 examples for maybe 3 epochs if you’re going for 4k tokens anyway and see how it comes out.

Topic		Replies	Views
Fine-Tuning Pegasus - Model Not Training? Models	4	1737	March 14, 2021
Errors when fine-tuning using Keras 🤗Transformers	0	669	February 18, 2022
Using Pegasus Model for Transfer Learning is generating garbage summaries 🤗Transformers	2	706	September 18, 2020
Using XLA fast text generation with Pegasus models Intermediate	5	569	August 25, 2022
Simple Model to rewrite/paraphrase Beginners	7	291	March 19, 2025

Fine-tuning Pegasus

Related topics