Llama 2 & 8K Training

delewis · August 4, 2023, 2:26am

Hello,

I am using SFFTrainer & bitsandbytes to fine tune Llama-2-7B on a dataset where the input will consistently be 5K-8K tokens. I am using an A10G and have no problems with setting max_seq_length to 2K or 4K, but whenever I set it to 5K+ I run out of VRAM. Also, worth mentioning that I am using RoPE to hopefully accommodate the larger context.

Do I need to set max_seq_length to 8K to train effectively on this dataset? Also, what is the relationship of max_seq_length & num_of_sequences? Is there anyway to accomodate a larger sequence length like 8K in the the VRAM I have available? I thought I could do it by decreasing num_of_sequences, but that doesn’t seem to be have any effect on the amount of VRAM being reserved.

Thank you!

Topic		Replies	Views
Llama 3.1 8b Instruct - Memory Usage More than Reported Models	5	453	February 18, 2025
Does setting max_seq_length to a too large number for fine tuning LLM using SFTTrainer affects model training? Beginners	1	1876	December 6, 2024
SFTTrainer takes up so much ram that it breaks an A100 GPU 🤗Transformers	0	201	July 6, 2024
Memory requierements Models	2	383	February 18, 2025
Fine Tuning LLama 3.2 1B Quantized Memory Requirements Models	6	1407	June 16, 2025

Llama 2 & 8K Training

Related topics