Running out of Memory with run_clm.py

MarkStrong · April 13, 2021, 9:38am

Hi,

first of all, thanks for creating such a cool library

I have already successfully fine-tuned a GPT2 model and I currently want to fine-tune a GPT2-Large model from the same 1.4 GB training dataset, but I seem to be running out of memory.

When I run the run_clm.py script, I usually get “Killed” as the output. My parameters are the following:

python run_clm.py \
--use_fast_tokenizer \
--model_name_or_path gpt2-large \
--train_file "/home/mark/Downloads/adp5/train2.txt" \
--validation_file "/home/mark/Downloads/adp5/test2.txt" \
--do_train \
--do_eval \
--fp16 \
--overwrite_cache \
--evaluation_strategy="steps" \
--output_dir finetuned \
--eval_steps 200 \
--num_train_epochs 1 \
--gradient_accumulation_steps 2 \
--per_device_train_batch_size 8

When viewing memory allocation, I can see that both system memory (64 GB) and swap (16 GB) have been completely allocated (GPU memory is not allocated).

I’ve tried using deepspeed as well, but end up with the same error.

Does anybody know what’s wrong?

Cheers,
Mark

lewtun · April 13, 2021, 7:23pm

Hey @MarkStrong do you still get memory issues if you reduce the batch size?

yubi-sanprit · November 7, 2022, 7:52am

@lewtun is there any scope of lazy loading to RAM from disk? i.e only that part of data will come into RAM on which training will happen on that particular time.

yubi-sanprit · December 14, 2022, 5:33am

@lewtun
In answer to my question on big data size and lazy loading:
Transformers dataset dict format and its map method to call any function like tokenisation and grouping is designed to run in batches.It will handle any big data with batch run. So, work with any size big data use convert your dataset in Transformers dataset dict format and map method

Topic		Replies	Views
How to deal with tokenizer out of memory in run_clm.py Beginners	0	306	March 22, 2023
Run_mlm.py cuda error memory after resuming a training 🤗Transformers	4	2921	April 21, 2021
Fine-Tune GPT-2 Spanish From Example Notebook OOM Beginners	0	674	December 17, 2020
Saving memory with run_mlm.py with wikipedia datasets 🤗Transformers	0	723	March 4, 2021
How to train a language model from scratch when my dataset is bigger than RAM? Beginners	19	9760	September 18, 2020

Running out of Memory with run_clm.py

Related topics