Issue with batching long sequences

zy113 · July 16, 2024, 4:14pm

So I had this question came to my mind when watching a LLM training tutorial: if the max sequence length of your training data is set to smaller than the sequence length of some of the sequences in your training set, then it means those sequences need to be broken into smaller pieces. My question is if this happens for some sequence of length n at index where m < n, say m = n/2, and m+1 < n, is the p(m+1|m) not going to be trained? But by right p(m+1|m) needs to be optimized right?

Topic		Replies	Views
Optimizing LLM Training with Variable Sequence Lengths: Impact on Model Performance Beginners	0	99	July 16, 2024
Grouping by length makes training loss oscillate and makes evaluation loss worse 🤗Transformers	2	239	June 3, 2025
Best practice to train LLMs on long sequences? 🤗Transformers	0	52	October 12, 2024
Does setting max_seq_length to a too large number for fine tuning LLM using SFTTrainer affects model training? Beginners	1	1879	December 6, 2024
Whats the maths behind padding_to_longest vs padding_to_model_max_len? Intermediate	1	321	July 20, 2022

Issue with batching long sequences

Related topics