Warm start with BigBird

nbroad · October 19, 2021, 7:07pm

I am interested in taking a Roberta model and turning into a BigBird model that can handle long sequences. In the BigBird paper, they mention doing a warm start from the Roberta checkpoint, which I interpret to mean they loaded all the pre-trained weights into the new model before doing more training on long sequences. It looks like all of the model components are identical, apart from the full attention being replaced with block sparse attention and the positional embeddings being different sizes.

The main problem I have right now is dealing with the positional embeddings. BigBird’s embeddings will be much larger than Roberta’s because the positional embeddings have shape (max_position_embeddings x hidden_size)

Roberta has max_position_embeddings = 514
BigBird has max_position_embeddings = 4096

Any ideas on how I should handle this?

My only ideas right now are:

Reinit all positional embeddings
Use Roberta positional embeddings for the first 514 and then reinit the remaining.
Use the pretrained BigBird positional embeddings (but this feels like cheating and is definitely not what the original authors did).

nbroad · July 22, 2022, 2:56pm

For anyone who interested, you can tile the roberta position embeddings 8 times to be the right shape for bigbird

[512, 512, 512… ]

rfarth · April 11, 2024, 2:34pm

Hey! How did you do this exactly? I’m trying to do the exact same thing

nbroad · May 2, 2024, 12:00am

Follow this

github.com

allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "aad_1s7ybD5o"
   },
   "source": [
    "# `RoBERTa` --> `Longformer`: build a \"long\" version of pretrained models\n",

This file has been truncated. show original

nbroad · May 2, 2024, 12:00pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Out of Memory training google/big-bird-roberta-base Models	0	879	December 22, 2021
Fine-tuning BERT with sequences longer than 512 tokens Models	7	27539	April 4, 2022
Does anyone else observer RoBERTa fine-tuning instability? 🤗Transformers	8	3115	April 20, 2023
Different size of Roberta-base tokenizer and model embedding Beginners	1	1101	March 1, 2022
BigBirDNA - Pretraining BigBird on DNA sequences Flax/JAX Projects	20	3838	March 21, 2023

Warm start with BigBird

Related topics