Dear HF, I have been trying to finetune the facebook/opt-13b model using the run_clm.py script in transformers/examples/pytorch/language-modeling. I am using 8 x 80gb a100’s on paperspace. The script works well for finetuning the smaller models. I keep running into a RuntimeError: CUDA out of me…

Awesome! Thanks so much for replying Sylvain!! Would you be able to have a look at this set up to see if there is anything you would improve because training is very expensive and I want to fix any obvious errors before starting! DeepSpeed Config JSON { "fp16": { "enabled": "auto", …

Fine-tune OPT 13B: CUDA out of memory error (720gb vram, batch size 1, fp16)!

Beginners

anujn June 24, 2022, 4:23pm 6

Worked like a charm! Thank you @sgugger you rock!!!

1 Like

Finetune LLM with DeepSpeed

Topic		Replies	Views
OPT Memory problem Beginners	2	813	June 2, 2022
Finetune LLM with DeepSpeed DeepSpeed	2	5113	February 22, 2024
Fine-tuning a 16B CodeGen model with 256GB RAM+2xA6000s? DeepSpeed	2	1644	July 3, 2023
RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 15.78 GiB total capacity; 12.36 GiB already allocated; 302.75 MiB free; 14.16 GiB reserved in total by PyTorch) Beginners	2	1324	September 11, 2021
Training llama2-13b-16k model with peft on 3 A100 of 80GB is still throwing cuda out of memory 🤗Accelerate	0	790	October 16, 2023

Fine-tune OPT 13B: CUDA out of memory error (720gb vram, batch size 1, fp16)!

Related topics