Best practice to train LLMs on long sequences?

oran-sh · October 12, 2024, 7:52pm

Hello,

I am trying to fine tune gemma-2-2b on long sequences (4000-8000 tokens).
Working with bf16 and 4 bit QLoRA still doesn’t fit to single 24GB GPU.
I want to scale this up to multi-GPU instance but this means I have to split the model to several GPUs.
What’s the best practice to training LLMs on long sequences?

Thank you

Topic		Replies	Views
How to run large LLMs like Llama 3.1 70B or Mixtral 8x22B with limited GPU VRAM? Beginners	2	1654	September 26, 2024
Which (and how) Multi GPU strategy to use to train model with longer max_length (Phi-2 fits in Single GPU but qLoRa gives OOM with 512)? 🤗Accelerate	3	1328	September 20, 2024
LoRA / QLoRA fine tuning a 8b Model(llama 3.1) Beginners	1	297	February 24, 2025
Recommended hardware for running LLMs locally Beginners	2	33042	December 18, 2023
Does setting max_seq_length to a too large number for fine tuning LLM using SFTTrainer affects model training? Beginners	1	1877	December 6, 2024

Best practice to train LLMs on long sequences?

Related topics