Model is getting loaded unevenly on GPUs

abpani1994 · July 11, 2024, 3:47am

I am trying to finetune Llama3 8B model using peft qlora.
Loading gptq model with automodelforcasualLM which gets loaded unevenly in the GPUs which prevents me from using batch size of more than 1.
I am using context length of 8192 or 4096.
if I use 4096 I cant go more than 2 batchsize
Screenshot 2024-07-08 at 7.31.22 PM

Please help.

abpani1994 · July 11, 2024, 3:50am

Another example.Happening with all models either gptq or bnb.

Topic		Replies	Views
Model is getting loaded unevenly with AutomodelforCasualLM 🤗Transformers	0	5	July 16, 2024
Model is getting loaded unevenly using AutomodelforCasualLM 🤗Transformers	0	4	July 16, 2024
Multi-gpu inference llama-3.2 vision with QLoRA 🤗Accelerate	4	104	April 25, 2025
Unable to load a FineTuned LLama Model to GPU for inference Beginners	3	2968	December 15, 2023
[SOLVED] What's the right way to do GPU paralellism for inference (not training) on AutoModelForCausalLM? 🤗Transformers	1	222	August 26, 2024

Model is getting loaded unevenly on GPUs

Related topics