Let’s train an even larger model together with Yandex, HuggingFace and Neuropark !
A few months ago we assembled to train a large SahajBERT. So let’s make it even larger!
Join Neuropark’s discord community with this link - Neuropark
We are about to start the training from- 2nd September
There will be a few new things to play with beside the 4x scale:
- sahajBERT 2.0 will start from sahajBERT 1.0 using Net2Net model expansion
- we’ll try hybrid training with both GPU and TPU and see how they compare
- and bring along local GPU devices (see below)
If you have a GPU desktop with ≥6GB memory and ≥30Mbit upload speed, we’d really appreciate if you can bring it to the collaborative run (and will help you with the setup). You can join and leave training at any time, even if it is only for a couple of hours.
Also, we’d really appreciate your ideas on the training procedure:
- fine-tuning benchmarks that we should run: anything beside Wikiann and Soham News Category Classification?
- future training runs: we’ll be able to train the model in ~2 weeks. Is there any other task that you would like to pretrain a model for? What data should we use there?
Let me know if you face any issues regarding joining or anything.
Check our previous models on neuropark (Neuropark)
Read the blog post about our previous collaborative training - Deep Learning over the Internet: Training Language Models Collaboratively
paper link - [2106.10207] Distributed Deep Learning in Open Collaborations
Thanks to Yandex and Huggingface for this initiative lets train 4x !