PreTrain Swahili Flax model

Potentially try out a number of model architectures (t5/roberta/gpt2/bigbird…) using datasets such as Oscar, mc5, gdelt,… & test with a fine-tuned task

1 Like

I really like this idea, especially since models on Swahili are quite sparse! Do you think we could settle on one model? It might make the project much easier :slight_smile:

Guess wanted to keep it open if anybody else joined and was keen on a particular Swahili downstream task but yes could settle on a single transformer architecture.

Alright, let me join you in this project - we have way too little model in Swahili so I’m happy trying to help you here :slight_smile:

1 Like

Would be awesome if we manage to find other people to join this project - otherwise It’ll be use two :slight_smile:

I think, first we should decide on a model architecture. I would suggest either BERT or GPT2. If we stick to BERT we should also try to find some good downstream data to fine-tune the model on :slight_smile:

And it would be great to find some good datasets in Swahili as well :slight_smile:

Feel free to also open a discord, we can chat there for more details :slight_smile:

will continue to add them here - Flax Swahili Pretraining - Google Sheets

1 Like

Awesome added you!

1 Like