Simple example of Transformer from scratch?

Stur86 · December 25, 2023, 3:57pm

Is there a full example of how to train an extremely small/simple transformer model (e.g. GPTNeo with only a hundred parameters) entirely from scratch? I’m trying to do this just for learning purposes but I keep getting CUDA errors. It can’t be that I’m filling the memory as the model and datasets are tiny, but it’s probably some kind of indexing error that might be due to my use of one or more parameters, and I’d like to start from a working example to figure it out. Any suggestions?

nielsr · December 25, 2023, 8:49pm

The simplest example is the NanoGPT project from Karpathy: GitHub - karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs..

See also the training scripts of Hugging Face for causal language modeling, which are a bit more extensive/feature complete: transformers/examples/pytorch/language-modeling at main · huggingface/transformers · GitHub.

Stur86 · December 25, 2023, 9:18pm

Ah, sorry, I did not explain myself clearly since we’re in the Huggingface forums and gave it for granted - I meant simplest using the transformers library. I’m getting errors when launching the training with the Trainer class, and am not sure I’m organizing everything right (datasets, tokenizer etc.). I guess your second link should be good for this, I’m interested in causal modeling so I guess run_clm.py it is for me.

Topic		Replies	Views
How to train a transformer from scratch Beginners	1	540	May 2, 2024
How to run the Causal Language modelling example on multiple gpu? 🤗Transformers	0	81	September 16, 2024
Train a transformer from scratch 🤗Transformers	0	434	August 9, 2021
Training a language model from scratch with tensorflow (not pytorch)? Intermediate	4	861	August 9, 2021
How to train gpt-2 from scratch? (no fine-tuning) Beginners	17	19124	December 14, 2022

Simple example of Transformer from scratch?

Related topics