How to train gpt-2 from scratch? (no fine-tuning)

Hi, I would like to train GPT-2 from scratch. I don’t want to fine-tuning an existing model, but actually train it from scratch with my own tokenizer. How could I do it?


Hi @iamnotapenguin, the place I would start is by adapting the following script for causal language modelling to your dataset: transformers/ at master · huggingface/transformers · GitHub

This script allows you to specify both the tokenizer and the model architecture, plus you can do multi-gpu training which is advisable if you’re training from scratch.

Hope that helps!