Fine-tuning gpt2 generates repetive test despte many hyperparameters, gpt-large/xl?

arcco96 · November 3, 2020, 7:05pm

Hello,

It seems using the run_clm.py training script overfits my dataset. After I train a model for 1 epoch at lrs between 1e6 and 1e4 it will only produce the same sentence repeating until the end of the block. A sample run looks like this:

python run_clm.py
–model_name_or_path gpt2-medium
–train_data_file train.txt
–do_train
–output_dir output-gpt2
–per_device_train_batch_size=1
–save_steps 1000
–num_train_epochs=1\

My training file is processed from the cornell movie database, it is formatted line by line.

Any advice would be awesome I’ll be using the model down stream so I’m looking to get a nice low loss and move on.

Additionally has anyone been able to train gpt2-large or ideal gpt2-xl? Can we distribute gpt2-large on a colab tpu with batch_size =1?

Thank you!

Topic		Replies	Views
Running out of Memory with run_clm.py Beginners	3	1680	December 14, 2022
Train GPT2 on wikitext from scratch Beginners	5	3838	October 25, 2021
Fine-tuning GPT2 Family (Small to XL), How should hyperparameters and generation criteria change? Models	0	1150	May 30, 2023
Fine tuning and retokenizing Beginners	0	589	May 29, 2022
Language-modeling script "killed" when fine-tuning gpt2-medium Beginners	3	6895	May 19, 2023

Fine-tuning gpt2 generates repetive test despte many hyperparameters, gpt-large/xl?

Related topics