Potentially try out a number of model architectures (t5/roberta/gpt2/bigbird…) using datasets such as Oscar, mc5, gdelt,… & test with a fine-tuned task
I really like this idea, especially since models on Swahili are quite sparse! Do you think we could settle on one model? It might make the project much easier
Guess wanted to keep it open if anybody else joined and was keen on a particular Swahili downstream task but yes could settle on a single transformer architecture.
Alright, let me join you in this project - we have way too little model in Swahili so I’m happy trying to help you here
Would be awesome if we manage to find other people to join this project - otherwise It’ll be use two
I think, first we should decide on a model architecture. I would suggest either BERT or GPT2. If we stick to BERT we should also try to find some good downstream data to fine-tune the model on
And it would be great to find some good datasets in Swahili as well
Feel free to also open a discord, we can chat there for more details
will continue to add them here - Flax Swahili Pretraining - Google Sheets
Awesome added you!