Have not tried Marian yet but it seems interesting. It’s for translation, right? But since I used translation mode for my problem it could definitely work.
Also I found excellent pre-trained models on TF Hub but they are not fine-tunable (according to the page). TransformerXL pre-trained on Wiki40B (a new dataset in 40 languages), separate model for each language. At least for me this would be the ultimate model. Seq2seq, unlimited sequence length and 41 languages. See https://tfhub.dev/google/collections/wiki40b-lm/1