Model is getting loaded unevenly with AutomodelforCasualLM

abpani1994 · July 16, 2024, 6:30pm

I am trying to do SFT with a context length of 4096
the same things works perfectly with LLama3 70B. The model and cache loading is balanced across all gpus.
But while loading Qwen2 9B or Llama3 8B for finetuning it is uneven.
Can’t even do batch size of 2 on 4 A10 gpus.

The model loads like above with a batch size of 1.

Please help.

Topic		Replies	Views
Model is getting loaded unevenly using AutomodelforCasualLM 🤗Transformers	0	4	July 16, 2024
Model is getting loaded unevenly on GPUs 🤗Transformers	1	50	July 11, 2024
[SOLVED] What's the right way to do GPU paralellism for inference (not training) on AutoModelForCausalLM? 🤗Transformers	1	223	August 26, 2024
Llama 3.1 8b Instruct - Memory Usage More than Reported Models	5	437	February 18, 2025
How many GPU resources do I need for full-fine tuning of the 7b model? 🤗Transformers	2	5113	June 5, 2025

Model is getting loaded unevenly with AutomodelforCasualLM

Related topics