Create a simple and reproducable training process for a GPT-like model?

n42 · December 27, 2023, 5:51pm

Hey

after watching/reading videos and blog posts I finally managed to build my own script for a GPT like model* including training/testing loop. You can see it here, if you need to see the details - I am not explicitly asking to fix the code, but if you like, you are more than welcome

*) Multi-Block-Multi-Head Self-Attention Approach

To evaluate if everything is working, I’d like to achieve some “quick wins”, so I am using a quiet small data set (22 MByte).

Still, when I run everything, the losses are quite high (using CrossEntropy, Loss is betwee 3 and 5).

Also, when I generate a result from a given really small “prompt” the output is bogus.

So I wonder: Are there any tricks or hints to know, to get a reasonable result within a “short time”?

Knowing, that all parameters are inter-dependent, I wonder: What hyper params could help me here? Lower learning rate? More epochs, larger block size? Are there any constrains or “thresholds”, like “no reasonalbe result under 72 hours” or with less then 100 MByte of training data?

I don’t try to get a fully fledged LLM. I just want to reproduce the steps and see simple results, like “Hello” completes to “Hello world” to understand the technology.

tia

n42 · December 27, 2023, 11:08pm

Hmm… well, the answer is probably simple. I created a dummy data set containing n repetitions of “Hello World!”. And it seems like this results in low loss but at least my algo seems to work. Still I wonder if there are other ways to achieve that. Still, the model is not very accurate, this is how it completes “Hello”:

Hello 
Helo Wor
Hello 
Helorld!
Hello 
Helorlo 
Hellorld! 
Held
Hello Wo 
Held!
Hellorlorlorld!
Hellorld! Wo

Topic		Replies	Views
How to train gpt-2 from scratch? (no fine-tuning) Beginners	17	19034	December 14, 2022
I need help getting more accurate results after training Beginners	0	56	August 25, 2024
Resources for model design (number of layers, attention heads, etc) Beginners	2	609	January 4, 2021
Fine-tuning gpt2 generates repetive test despte many hyperparameters, gpt-large/xl? Beginners	0	556	November 3, 2020
Train GPT2 on wikitext from scratch Beginners	5	3838	October 25, 2021

Create a simple and reproducable training process for a GPT-like model?

Related topics