How much would 13B take, 13*4 = 52 GB?
We are getting a CUDA OOM error while finetuning a 13B Llama model on a 4xA100 cluster, what may we be doing wrong
How much would 13B take, 13*4 = 52 GB?
We are getting a CUDA OOM error while finetuning a 13B Llama model on a 4xA100 cluster, what may we be doing wrong