Collaborative Training Experiment of an Albert Model for Bengali

Huggingface is launching a collaborative training experiment of an Albert Model for Bengali language with our community. We are actively looking for participants who will help us to train the model. :fire:

So what do you need in order to participate-

  1. A Google Colab account
    That’s everything you need.
    [Although if you want to use the power of your own GPUs, Huggingface will also provide a script for that.]

How you can contribute?

  1. If you are a native Bengali speaker, that would be a great help, we are looking for participants who will check the performance of the tokenizer, sentence splitter, etc.

  2. You might want to help us preprocessing the dataset. We are using the Wikidump and OSCAR Bengali dataset to train the model, if you have some suggestions on preprocessing these feel free to contribute in that part.

  3. Now the main part, distributive training. You have been provided a google colab script in order to start the training and if your kernel crashes just restart the training script. (Non native speakers can participate)

Join our discord community link -
[A separate slack channel from Huggingface will be provided where you will get to know more about the distributive training framework and other related things.]

We are aiming to start this collaborative training experiment from - May 7th
Please do participate in this first Huggingface collaborative training experiment specifically the native bengali speakers. :hugs:


Also I forgot to mention the main thing. Thanks to Yandex for creating this collaborative distributive training strategy. Without them this huge community training event would not be possible. :hugs: