I was hoping to adopt the approach from one of the official notebooks described here: https://www.philschmid.de/fine-tune-a-non-english-gpt-2-model-with-huggingface
I wanted to switch the german model for a spanish one I found on the hub (ensamblador/gpt2-es-48heads) but I’m running out of memory on the colab notebook at this point:
from transformers import TextDataset,DataCollatorForLanguageModeling def load_dataset(train_path,test_path,tokenizer): train_dataset = TextDataset( tokenizer=tokenizer, file_path=train_path, block_size=128) test_dataset = TextDataset( tokenizer=tokenizer, file_path=test_path, block_size=128) data_collator = DataCollatorForLanguageModeling( tokenizer=tokenizer, mlm=False, ) return train_dataset,test_dataset,data_collator train_dataset,test_dataset,data_collator = load_dataset(train_path,test_path,tokenizer)
My training set is about 200 mb. The obvious solution to me would be to increase RAM via another GPU in a Google Ai Notebook.
Is there a way to avoid OOM on the free colab notebooks with one GPU?