Fine-tune a Llama 2 7b hf take 160 hours on RTX 4070?

KSmart · April 10, 2024, 1:45pm

Hi,

I would like to train a Llama 2 7b based on a singular RTX 4070 GPU with a small dataset by running auto train command locally:

autotrain llm --train --project-name my-llm --model meta-llama/Llama-2-7b-hf --data-path timdettmers/openassistant-guanaco --use-peft --quantization int4 --lr 2e-4 --batch 2 --epochs 3 --trainer sft

But the esitimation is still like this after running almost half a day.
1%|▌ | 47/6393 [11:58:31<1612:04:01, 914.50s/it]

Is there any approach or parameters changes which can make it quicker?
Any suggestions or helps will be highly appreciated.

K

swtb · April 10, 2024, 2:27pm

Are you maxing out your available VRAM?

The available Parameters are here:
autotrain-advanced/src/autotrain/trainers/clm/params.py at main · huggingface/autotrain-advanced (github.com)

Mixed Precision may increase throughput. Though I am unsure whether this can be used concurrently with peft.

auto_find_batch_size may ensure that you get a good batch size that fully utilises your 4090 VRAM.

Topic		Replies	Views
How many GPU resources do I need for full-fine tuning of the 7b model? 🤗Transformers	2	5109	June 5, 2025
Autotrain - Training of Llama3.1 70B 🤗AutoTrain	0	124	September 25, 2024
Fine-tuning Llama-7B Models	2	10612	May 2, 2023
meta-llama/Llama-2-7b-chat-hf weird responses, compared to the ones returned by the HF API 🤗Transformers	1	113	February 2, 2025
Why the model loading of llama2 is so slow? 🤗Transformers	6	9456	April 24, 2024

Fine-tune a Llama 2 7b hf take 160 hours on RTX 4070?

Related topics