Is there a way to finetune GPT2 775M on 16GB VRAM and 24GB RAM?

ArtemS · August 7, 2021, 12:21pm

I was able to finetune GPT2 355M 2048 sequence, without FP16, all fit in VRAM.

But no luck with GPT2 755M. Obviously didn’t fit to VRAM, so I used FP16 and DeepSpeed CPU offload, That way I got 9GB VRAM free, but out of RAM.

Did someone succeed with GPT2 training for 775M with 16GB VRAM?

ArtemS · August 10, 2021, 8:55pm

Now I’m able to run training with block size 1568.
Free VRAM 1593MiB, free RAM 14GB

Feel like there is a room to run block size 2048…

Topic		Replies	Views
Training RAM memory issues Beginners	0	226	January 10, 2023
Finetune LLM with DeepSpeed DeepSpeed	2	5119	February 22, 2024
Question about FP16/32, LoRA and GPU Memory Usage 🤗Transformers	1	3758	September 18, 2023
Should 24GB of VRAM be able to fine tune a 1B model? Beginners	9	628	February 23, 2025
How to finetune mt0-xxl-mt(13B parameters) seq2seq_qa with deepspeed Models	0	795	February 15, 2023