Hey
after watching/reading videos and blog posts I finally managed to build my own script for a GPT like model* including training/testing loop. You can see it here, if you need to see the details - I am not explicitly asking to fix the code, but if you like, you are more than welcome
*) Multi-Block-Multi-Head Self-Attention Approach
To evaluate if everything is working, Iâd like to achieve some âquick winsâ, so I am using a quiet small data set (22 MByte).
Still, when I run everything, the losses are quite high (using CrossEntropy, Loss is betwee 3 and 5).
Also, when I generate a result from a given really small âpromptâ the output is bogus.
So I wonder: Are there any tricks or hints to know, to get a reasonable result within a âshort timeâ?
Knowing, that all parameters are inter-dependent, I wonder: What hyper params could help me here? Lower learning rate? More epochs, larger block size? Are there any constrains or âthresholdsâ, like âno reasonalbe result under 72 hoursâ or with less then 100 MByte of training data?
I donât try to get a fully fledged LLM. I just want to reproduce the steps and see simple results, like âHelloâ completes to âHello worldâ to understand the technology.
tia