Create a simple and reproducable training process for a GPT-like model?

Hmm… well, the answer is probably simple. I created a dummy data set containing n repetitions of “Hello World!”. And it seems like this results in low loss but at least my algo seems to work. Still I wonder if there are other ways to achieve that. Still, the model is not very accurate, this is how it completes “Hello”:

Hello 
Helo Wor
Hello 
Helorld!
Hello 
Helorlo 
Hellorld! 
Held
Hello Wo 
Held!
Hellorlorlorld!
Hellorld! Wo