Hello everyone,
I would like to train GPT2 on wikitext from scratch (not fine-tune pre-trained model). I launched the following script in this folder.
python run_clm.py
–model_type gpt2
–tokenizer_name gpt2
–block_size 256
–dataset_name wikitext
–dataset_config_name wikitext-2-raw-v1
–do_train
–do_eval
–overwrite_output_dir
–num_train_epochs 1
–output_dir /tmp/test-clm
Now I have two questions:
1- I was wondering if what I did is indeed a correct approach to train GPT2 from scratch?
2- I would like to know what hyperparameters I shoud use for this task? ( as far as I can tell, the suggested hyperparameters in existing examples in huggingface repo are for fine-tuning pre-trainned model)