Hi I am trying to finetune a 35B model using lora (r and alpha 64) . My batch size is 2 and grad accumulation is 2 . I am using 8 A100 80GB gpus with deepspeed zero2 . I estimated it would require 3 gpus to do this . But I am not even able to achieve this on 8GPUs . I keep on getting CUDA OOM. I am unable to figure out why this disceperancy exists. It will be great if someone can explain why this is happening.
Related Topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
GPU memory usage of optimizer's states when using LoRA | 4 | 107 | July 5, 2024 | |
Question about FP16/32, LoRA and GPU Memory Usage | 1 | 2827 | September 18, 2023 | |
Mistral-7B-v0.1 finetuning results in Out-of-Memory after some iterations | 2 | 980 | January 19, 2024 | |
11B model gets OOM after using deepspeed zero 3 setting with 8 32G V100 | 0 | 565 | April 8, 2024 | |
Memory Error While Fine-tuning AYA on 8 H100 GPUs | 0 | 141 | April 30, 2024 |