Long Text generation

I have some questions about different models in text generation, and I will be thrilled to hear your answers.
Right now, I’m working on text generation task for non-English languages(for now, it’s Arabic and Persian).
First which architecture will be the best for this task? So far, I’ve tested GPT-2, Bert, AWD-LSTM, and I got the best results for GPT-2 though due to a shortage of resources, I had to train the little model. Are there any alternatives that I should consider? And which one do you think works best for text generation?

Second, about long text generation, which criteria should I consider? How much the size of data-set and model context size affect this? and is model architecture plays a significant role in this?

I know there are lots of questions, but I really need the help of practitioners like you.

Thanks :slightly_smiling_face:.