I am trying to finetune Llama3 8B model using peft qlora.
Loading gptq model with automodelforcasualLM which gets loaded unevenly in the GPUs which prevents me from using batch size of more than 1.
I am using context length of 8192 or 4096.
if I use 4096 I cant go more than 2 batchsize
Please help.