Fine-Tune GPT-2 Spanish From Example Notebook OOM

enrico · December 17, 2020, 10:32pm

Hi!

I was hoping to adopt the approach from one of the official notebooks described here: https://www.philschmid.de/fine-tune-a-non-english-gpt-2-model-with-huggingface

I wanted to switch the german model for a spanish one I found on the hub (ensamblador/gpt2-es-48heads) but I’m running out of memory on the colab notebook at this point:

from transformers import TextDataset,DataCollatorForLanguageModeling

def load_dataset(train_path,test_path,tokenizer):
    train_dataset = TextDataset(
          tokenizer=tokenizer,
          file_path=train_path,
          block_size=128)
     
    test_dataset = TextDataset(
          tokenizer=tokenizer,
          file_path=test_path,
          block_size=128)   
    
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer, mlm=False,
    )
    return train_dataset,test_dataset,data_collator

train_dataset,test_dataset,data_collator = load_dataset(train_path,test_path,tokenizer)

My training set is about 200 mb. The obvious solution to me would be to increase RAM via another GPU in a Google Ai Notebook.

Is there a way to avoid OOM on the free colab notebooks with one GPU?

Topic		Replies	Views
How to train a gpt2 with colab pro Models	16	3710	February 29, 2024
Fine tuning and retokenizing Beginners	0	589	May 29, 2022
PAD with Collator 🤗Datasets	1	645	June 4, 2021
How to train gpt-2 from scratch? (no fine-tuning) Beginners	17	19041	December 14, 2022
Fine Tuning IMDb tutorial - Unable to reproduce and adapt Beginners	19	8596	August 21, 2020

Fine-Tune GPT-2 Spanish From Example Notebook OOM

Related topics