Train GPT2 on wikitext from scratch

Hamidreza · April 2, 2021, 5:45pm

Hello everyone,

I would like to train GPT2 on wikitext from scratch (not fine-tune pre-trained model). I launched the following script in this folder.

python run_clm.py
–model_type gpt2
–tokenizer_name gpt2
–block_size 256
–dataset_name wikitext
–dataset_config_name wikitext-2-raw-v1
–do_train
–do_eval
–overwrite_output_dir
–num_train_epochs 1
–output_dir /tmp/test-clm

Now I have two questions:
1- I was wondering if what I did is indeed a correct approach to train GPT2 from scratch?
2- I would like to know what hyperparameters I shoud use for this task? ( as far as I can tell, the suggested hyperparameters in existing examples in huggingface repo are for fine-tuning pre-trainned model)

sgugger · April 5, 2021, 1:17pm

I can confirm the command is correct if you want to train from scratch. As for hyperparameters, you will need to tune them a bit, but the defaults should not be too bad.

Hamidreza · April 5, 2021, 3:31pm

Thanks for your time and reply. I was also wondering how many epochs you suggest for training from scratch?

Hamidreza · April 13, 2021, 6:48pm

I was wondering if there is any update in this thread? I like to know how many epochs is suggested for training GPT-2 on wikitext from scratch?

jbmaxwell · September 12, 2021, 1:57am

I have a custom vocab and want to use the BertWordpieceTokenizer for this reason. How would I do that with this new run_clm.py script?

RylanSchaeffer · October 25, 2021, 4:08am

@Hamidreza did you find out how long GPT-2 takes to train from scratch?

Topic		Replies	Views
How to train gpt-2 from scratch? (no fine-tuning) Beginners	17	19103	December 14, 2022
Pretrain gpt2 example Beginners	0	305	June 11, 2021
GPT2 Training from scratch in German 🤗Transformers	3	2315	October 3, 2020
Train gpt-2 from scratch in Italian Beginners	0	880	September 8, 2022
Training GPT-2 from scratch Beginners	2	1236	August 3, 2020

Train GPT2 on wikitext from scratch

Related topics