Trade off between max_length vs loss

agneet · April 23, 2023, 6:37pm

Hello, I am fine-tuning GPT-J-6B with each input data point = 2048 tokens. My loss decreases, albeit at a very slow rate. I was hoping to understand if the size of 2048 might be an issue? Is there a correlation between max. size and the loss? Should I decrease the max. size to a lower number (ex. 512) and try again?

Topic		Replies	Views
Loss computed for single token in GPT-2 Intermediate	0	331	April 12, 2023
How to change the size of model_max_length? 🤗Tokenizers	0	946	March 3, 2023
Default gpt-j output length Beginners	0	363	April 23, 2022
[Trainer] Evaluation loss changes with batch size 🤗Transformers	2	16	July 7, 2025
Can we get per word loss from the output of a GPT model Beginners	0	366	March 2, 2022

Trade off between max_length vs loss

Related topics