Hi! Considering the similarity between the Scandinavian languages, I suggest we might achieve a higher performance by utilising data from all of the languages. Just a suggestion. I made a project proposal here: Scandinavian RoBERTa. It’s using RoBERTa and not GPT-2 however, but I’m not too fuzzed about the model architecture, to be honest.
1 Like