Cuda memory error even when passing the no_cuda argument

rimusa · November 23, 2022, 1:52pm

Hi, at my uni servers we don’t have much GPU capacity, so they end up being in constant use. This means that if I try to run a model when the GPUs are in use by someone else, I will get an out of memory error (CUDA error: out of memory).

Looking through the documentation, I saw that if I pass the argument no_cuda=True to the TrainingArguments for a Trainer, then it won’t be using GPUs anymore, even when they are available. However, once I do trainer.predict(dataset), I still get the out of memory error for CUDA. Is this expected?

This is the code I’m using, for reference:

HF_model_name = "../models/test_results_0"

HF_model = AutoModelForSequenceClassification.from_pretrained(HF_model_name, output_hidden_states=True).to("cpu")
HF_tokenizer = AutoTokenizer.from_pretrained(HF_model_name)

args = TrainingArguments(no_cuda=True,output_dir=".")

trainer = Trainer(
        model=HF_model,
        args=TrainingArguments(args,output_dir="."),
    )

data_files = "../data/fakenews/cleaned_data/text_dataset.tsv"
 df = pd.read_csv(data_files, sep="\t")

 text = df.loc[0,"text"]

def tokenize_data(example):
    return HF_tokenizer(example["text"], padding="max_length")

print(len(text))

text_small = text[:200]
tokenized_text = pd.DataFrame([text_small], columns=["text"])
dataset = Dataset.from_dict(tokenized_text)
dataset = dataset.map(tokenize_data, batched=True)

T = trainer.predict(dataset)

Topic		Replies	Views
Cuda out of memory while using Trainer API Beginners	1	1770	October 20, 2021
CUDA out of memory error while predicting (evaluation) 🤗Transformers	1	1473	March 22, 2024
RuntimeError: CUDA out of memory. Tried to allocate 11.53 GiB (GPU 0; 15.90 GiB total capacity; 4.81 GiB already allocated; 8.36 GiB free; 6.67 GiB reserved in total by PyTorch) Beginners	4	3091	April 20, 2021
RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 15.78 GiB total capacity; 12.36 GiB already allocated; 302.75 MiB free; 14.16 GiB reserved in total by PyTorch) Beginners	2	1377	September 11, 2021
Solving "CUDA out of memory" when fine-tuning GPT-2 🤗Transformers	0	1422	January 6, 2022

Cuda memory error even when passing the no_cuda argument

Related topics