Collaborative Training Experiment of an Albert Model for Bengali

tanmoyio · May 5, 2021, 6:15am

Huggingface is launching a collaborative training experiment of an Albert Model for Bengali language with our community. We are actively looking for participants who will help us to train the model.

So what do you need in order to participate-

A Google Colab account
That’s everything you need.
[Although if you want to use the power of your own GPUs, Huggingface will also provide a script for that.]

How you can contribute?

If you are a native Bengali speaker, that would be a great help, we are looking for participants who will check the performance of the tokenizer, sentence splitter, etc.
You might want to help us preprocessing the dataset. We are using the Wikidump and OSCAR Bengali dataset to train the model, if you have some suggestions on preprocessing these feel free to contribute in that part.
Now the main part, distributive training. You have been provided a google colab script in order to start the training and if your kernel crashes just restart the training script. (Non native speakers can participate)

Join our discord community link - https://discord.gg/GD9G4j8fJU
[A separate slack channel from Huggingface will be provided where you will get to know more about the distributive training framework and other related things.]

We are aiming to start this collaborative training experiment from - May 7th
Please do participate in this first Huggingface collaborative training experiment specifically the native bengali speakers.

tanmoyio · May 6, 2021, 9:43am

Also I forgot to mention the main thing. Thanks to Yandex for creating this collaborative distributive training strategy. Without them this huge community training event would not be possible.

Topic		Replies	Views
Bengali NLP - Introductions Languages at Hugging Face	14	2309	February 26, 2021
Thai NLP - Introductions Languages at Hugging Face	3	1639	October 10, 2022
NLP Sense Making Beginners	0	421	March 31, 2022
German NLP Repository Languages at Hugging Face	11	4535	November 21, 2023
A service to translate datasets into other languages 🤗Datasets	1	860	June 6, 2023

Collaborative Training Experiment of an Albert Model for Bengali

Related topics