Fine-tuning Pegasus

Hi @valhalla ! links are not working now, can you point us to the new place? Thanks for your work!
In addition, to fine tune pegasus, is it needed two fields: context + summary, like xsum, or we can fine tune the ‘inner’ language model (like ULM-Fit), just to make the summaries more suitable for the desire domain knowledge.
Thanks again.

Hi! Can you be more specific as to which examples you removed to improve this performance? I have seen these issues. Right now am seeing model instability where it outputs different summaries for the same input sentence. Would love to get this model working, thank you.

PS : If anyone is trying to train Pegasus using the Seq2SeqTrainer here is a bit of code I took from finetune_trainer.py and utils.py in the seq2seq folder in legacy. It works to freeze the embeddings for the model after you’ve loaded it.

from torch import nn 

def freeze_params(model: nn.Module):
    """Set requires_grad=False for each of model.parameters()"""
    for par in model.parameters():
        par.requires_grad = False

def freeze_embeds(model):
    """Freeze token embeddings and positional embeddings for bart, just token embeddings for t5."""
    model_type = model.config.model_type

    if model_type in ["t5", "mt5"]:
        freeze_params(model.shared)
        for d in [model.encoder, model.decoder]:
            freeze_params(d.embed_tokens)
    elif model_type == "fsmt":
        for d in [model.model.encoder, model.model.decoder]:
            freeze_params(d.embed_positions)
            freeze_params(d.embed_tokens)
    else:
        freeze_params(model.model.shared)
        for d in [model.model.encoder, model.model.decoder]:
            freeze_params(d.embed_positions)
            freeze_params(d.embed_tokens)
freeze_embeds(model)

Hi, maybe this a silly question but… Isn’t there an already pretrained on big_patent version of pegaus?

Yes, that model was released this year, we were working on this since 2020 and that’s when we needed help with it, so I posted about it here. The variant of pegasus fine-tuned on big-patent by the way, is pretty heavy. It does have a high inference time too!

Ok, thanks! I’m working with bigbird pegasus which works with longer sequences but does not use full n-square attention, so let’s see…

Check this application out http://summent-summarizer.herokuapp.com/

We built it on bigbirdpegasus!

2 Likes

Cool!
One question, what GPU requirements did you use when training such a huge model?
With a 16GB GPU on Colab Pro Im running out of memory…

We did not fine-tune bigbirdpegasus-bigpatent, it gave decent at best results for our use case. But if you’d like to fine-tune, it will take a large GPU sagemaker notebook on AWS or large GCP server. How many training examples are you using right now?

Not many, we are testing with some 100 examples, but we can curate more (maybe up to 1000 at most)

I would suggest if you’re fine-tuning with decent sized samples, you could try with 10 to begin with!

The problem is that we are working with maximum size sequences, around 4000 tokens. And even with batch_size = 1, my RAM (16 GB GPU) is not able to process it…

Since I am working with batch_size = 1, the number of samples should not matter, should it?

I think the model by itself performs decently well, so you could fine-tune it on 10 examples for maybe 3 epochs if you’re going for 4k tokens anyway and see how it comes out.