Always getting RuntimeError: CUDA out of memory with Trainer


I am using huggingface on my google colab pro+ instance, and I keep getting errors like

RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 15.78 GiB total capacity; 13.92 GiB already allocated; 206.75 MiB free; 13.94 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I dont understand why? My dataset is microscopic (40K sentences), and all I am doing is loading bert-large-uncased and follow along the text classification notebook

from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('bert-large-cased')
from datasets import load_dataset, load_metric

metric = load_metric('glue', 'sst2')
model = AutoModelForSequenceClassification.from_pretrained("bert-large-cased", num_labels=2)

my trainer args are super standard

from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

batch_size = 16

args = TrainingArguments(
    evaluation_strategy = "epoch",
    save_strategy = "epoch",

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = predictions[:, 0]
    return metric.compute(predictions=predictions, references=labels)

trainer = Trainer(


Am I missing something? Should I change some of the options?

(Just posting this in case someone smarter doesn’t post a better idea)

Colab’s performance varies a lot. I ran the same script (dataset in question had 1200 sentences) and sometimes I get out of memory error and sometimes not. My latest project has 270 sentences and ran fine on the first try.

thanks. but colab pro+ gives you about 50GB of RAM and a Tesla P100… so I should have enough RAM here…

Make sure you’re not running out of GPU ram though? I think the GPU is capped at 16Gigs. PyTorch takes up 14Gigs.

What do you mean by this? The amount of GPU memory used generally depends on the model, batch size, and sequence length.

This is where things go over my head. If anyone smarter can interject I’d appreciate it :sweat_smile:

But if you look at the error message OP posted, it appears that his GPU memory is being hogged by PyTorch?

@BramVanroy thanks for your input. Is there a rough back-of-the-envelope to know how much memory I need to run a model? It seems the base-large-cased is quite big, but how big?


The question belongs in an FAQ.