Hmm… well, the answer is probably simple. I created a dummy data set containing n repetitions of “Hello World!”. And it seems like this results in low loss but at least my algo seems to work. Still I wonder if there are other ways to achieve that. Still, the model is not very accurate, this is how it completes “Hello”:
Hello
Helo Wor
Hello
Helorld!
Hello
Helorlo
Hellorld!
Held
Hello Wo
Held!
Hellorlorlorld!
Hellorld! Wo