How to train gpt-2 from scratch? (no fine-tuning)

Hi @iamnotapenguin, the place I would start is by adapting the following script for causal language modelling to your dataset: transformers/run_clm.py at master · huggingface/transformers · GitHub

This script allows you to specify both the tokenizer and the model architecture, plus you can do multi-gpu training which is advisable if you’re training from scratch.

Hope that helps!

1 Like