So I had this question came to my mind when watching a LLM training tutorial: if the max sequence length of your training data is set to smaller than the sequence length of some of the sequences in your training set, then it means those sequences need to be broken into smaller pieces. My question is if this happens for some sequence of length n at index where m < n, say m = n/2, and m+1 < n, is the p(m+1|m) not going to be trained? But by right p(m+1|m) needs to be optimized right?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Optimizing LLM Training with Variable Sequence Lengths: Impact on Model Performance | 0 | 36 | July 16, 2024 | |
Best practice to train LLMs on long sequences? | 0 | 18 | October 12, 2024 | |
Whats the maths behind padding_to_longest vs padding_to_model_max_len? | 1 | 315 | July 20, 2022 | |
Fine tune with different max_length | 2 | 430 | June 16, 2022 | |
Does setting max_seq_length to a too large number for fine tuning LLM using SFTTrainer affects model training? | 0 | 1396 | November 15, 2023 |