I’m writing to ask what it means that some of the most advanced language models are available on Hugging Face’s model hub. And I’m writing to ask what it means if your language does not appear anywhere in the model hub.
What does it mean that I can download Google’s BERT, OpenAI’s GPT2 or Facebook’s M2M 100 and run them on my laptop? What does it mean that Hugging Face has made these models so easy to use that I can batch translate a whole document in just a few lines of code?
For the millions of underserved people who speak one of the top 100 languages, this is wonderful! We can translate books, translate Wikipedia and create an endless stream of educational resources.
Meanwhile the millions of people who don’t speak one of those 100 languages won’t get any new textbooks. For educational materials, they’ll remain dependent on a foreign language – usually the language of their colonizer – until they start translating.
But they can start translating! A few dedicated people could translate enough sentence pairs to train a basic machine translator. Then with back-translation, multilingual translation and other tricks, they could get it up to a respectable quality.
Will they start translating? Or will those languages wither away because they’re not in today’s top 100? What’s the significance of your language being included in these models? What does it mean if your language is not included?