Lora finetuning 35 B model error

arihant-neohumans · June 11, 2024, 4:51pm

Hi I am trying to finetune a 35B model using lora (r and alpha 64) . My batch size is 2 and grad accumulation is 2 . I am using 8 A100 80GB gpus with deepspeed zero2 . I estimated it would require 3 gpus to do this . But I am not even able to achieve this on 8GPUs . I keep on getting CUDA OOM. I am unable to figure out why this disceperancy exists. It will be great if someone can explain why this is happening.

Topic		Replies	Views
LoRA Finetuning RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! 🤗Transformers	4	31	June 16, 2025
LoRA training with accelerate / deepspeed DeepSpeed	3	2317	May 28, 2025
GPU memory usage of optimizer's states when using LoRA DeepSpeed	4	729	July 5, 2024
Question about FP16/32, LoRA and GPU Memory Usage 🤗Transformers	1	3758	September 18, 2023
LoRA / QLoRA fine tuning a 8b Model(llama 3.1) Beginners	1	275	February 24, 2025

Lora finetuning 35 B model error

Related topics