Missmatch between memory-estimate and Trainer-API

AlJ95 · January 23, 2024, 10:33pm

Hey, I fool around with some LM for Code Generation and I ran out of memory all the time, no matter which model I use. I have 8GB VRAM, so I tried flax-community/gpt-neo-125M-code-clippy-dedup-2048, because accelerate estimate-memory estimated 1.89GB VRAM for Training with Adam.

When I train, I get the following error:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 246.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Of the allocated memory 20.73 GiB is allocated by PyTorch, and 1.35 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Why does PyTorch allocate so much memory?

I reduced the sql-files to 10
I reduced batch size to 1
I tried to reduce the max_length parameter as well, but it had no effect
I use transformers 4.37.0 and torch 2.1.2+cu121

Here is my code for more context:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, DataCollatorForLanguageModeling, TrainingArguments, Trainer
from datasets import Dataset, load_dataset, Value, Features
from pathlib import Path


DATA_PATH = Path("./data")
files = [str(p) for p in DATA_PATH.glob("*.sql")][:10]
train_files, test_files = files[:int(len(files) * 0.8)], files[int(len(files) * 0.8):]

features = Features({'code': Value('string')})
ds = load_dataset("text", data_files={"train": train_files, "test": test_files}, sample_by="document", features=features)

tokenizer = AutoTokenizer.from_pretrained(
    # "stabilityai/stable-code-3b",
    # "deepseek-ai/deepseek-coder-1.3b-instruct",
    "flax-community/gpt-neo-125M-code-clippy-dedup-2048",
    trust_remote_code=True)

tokenizer.add_special_tokens({'pad_token': '[PAD]'})

model = AutoModelForCausalLM.from_pretrained(
    # "stabilityai/stable-code-3b",
    # "deepseek-ai/deepseek-coder-1.3b-instruct",
    "flax-community/gpt-neo-125M-code-clippy-dedup-2048",
    trust_remote_code=True,
    torch_dtype="auto",
)

max_length = model.config.max_position_embeddings

tokenized_dataset = (ds.map(
    lambda example: tokenizer(example["code"],
                              return_tensors="pt",
                              truncation=True,
                              padding="max_length",
                              max_length=max_length),
    batched=True,
    batch_size=1,
))
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
training_args = TrainingArguments("test-trainer")
model.cuda()


trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_dataset["train"],
    data_collator=data_collator,
    tokenizer=tokenizer
)

trainer.train()

Topic		Replies	Views
torch.cuda.OutOfMemoryError 🤗Transformers	0	2059	July 5, 2023
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 39.56 GiB total capacity; 37.84 GiB already allocated; 242.56 MiB free; 37.96 GiB reserved in total by PyTorch) 🤗Transformers	2	5359	June 7, 2023
RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 15.78 GiB total capacity; 12.36 GiB already allocated; 302.75 MiB free; 14.16 GiB reserved in total by PyTorch) Beginners	2	1326	September 11, 2021
Repeated training runs out of GPU memory 🤗Transformers	3	260	December 16, 2024
Multi GPU Training with Trainer and TokenClassification Model 🤗Transformers	0	1521	July 21, 2023

Missmatch between memory-estimate and Trainer-API

Related topics