Max_length parameter in T5

I am trying to finetune a set of T5 models and it is going well.

However, I have noticed a “max_length” parameter showing up in the config parameters in W&B. I am using this example script for summarization.

I thought that during training the model keeps predicting tokens autoregressively until the eos token gets generated. If the model only predicts a maximum of 20 tokens during training it might explain why my validation loss is lower than my train loss (I am using generation_max_length = 80 for validation metrics).

I have not found any info about this parameter in the documentation or where it comes from.

If anyone is familiar with the parameter, I would appreciate an explanation of it.

Thanks in advance

After some further research, it seems like the parameter comes from PretrainedConfig in configuration_utils.py

I am still not sure if this parameter is used during training or what effect it has.

1 Like

Hi navjordj, i also use T5 & wandb, and modified my max_length to another parameter. Also for me max_length=20 keeps appearing and I dont know why. Did you find out anything else so far?

Hello tsei902! :hugs:

I discovered that the parameter originates from configuration_utils.py, which is inherited by the T5 configuration.

Upon examining the source code, I couldn’t find any instances where this parameter is utilized. My best guess is that if generation_max_length is not provided, it defaults to max_length.

Please let me know if you find out of the parameter is used or has a effect.

Yes, I found that now too. It seems that parameter is T5 default and doesnt get overwritten in wandb, even though I pass another max_length value during generation. E.g. the default parameter of max_length=20 is only taken if no other value is given.