A service to translate datasets into other languages

Bruno · June 4, 2023, 11:15pm

I have an innovative solution that can boost Hugging Face developing models in various languages. Currently, we face a significant challenge: while we have a vast dataset for the English language, we have a considerable scarcity of data for users of other nationalities. This limits our ability to create models in different languages and hinders the inclusion of a variety of users.
To overcome this difficulty, I propose the creation of a service integrated into the Huggingface dataset that will allow direct translation of the data. The idea is to utilize a highly efficient translation model to perform this task. With this solution, we will have the opportunity to encourage and facilitate the development of models in various languages.
By incorporating this functionality, we will broaden the reach and usefulness of Huggingface for users of different nationalities. Imagine the possibility of training and deploying high-quality models in French, Spanish, Portuguese, Mandarin, and many other languages, opening up a new world of opportunities and promoting greater global inclusion.
I am excited about this solution and would like to discuss further details regarding implementation and the necessary tools. I intend to use a robust translation model like Huggingface’s Transformer, which has exceptional performance in translation tasks. Additionally, we can explore other complementary technologies such as data preprocessing and automatic post-editing to further enhance the results.
I am available to exchange ideas and collaborate with colleagues interested in this initiative. Together, we can transform Huggingface into a truly global platform, empowering users of all nationalities to benefit from language models.

mariosasko · June 6, 2023, 6:40pm

It makes more sense to implement something like this as a space on the Hub rather than as a datasets feature.

Topic		Replies	Views
Здорово! Contribute to Multilingual LLM! Community Calls	0	308	March 15, 2024
Translate the docs Community Calls	1	21	April 23, 2025
Translation model to 100+ Languages Research	4	1931	January 25, 2025
German NLP Repository Languages at Hugging Face	11	4536	November 21, 2023
Dataset curation extra parameters Beginners	2	31	January 19, 2025

A service to translate datasets into other languages

Related topics