PreTrain Swahili Flax model

alokmatta · June 29, 2021, 11:58am

Potentially try out a number of model architectures (t5/roberta/gpt2/bigbird…) using datasets such as Oscar, mc5, gdelt,… & test with a fine-tuned task

patrickvonplaten · June 29, 2021, 2:20pm

I really like this idea, especially since models on Swahili are quite sparse! Do you think we could settle on one model? It might make the project much easier

alokmatta · June 29, 2021, 6:39pm

Guess wanted to keep it open if anybody else joined and was keen on a particular Swahili downstream task but yes could settle on a single transformer architecture.

patrickvonplaten · June 30, 2021, 12:32pm

Alright, let me join you in this project - we have way too little model in Swahili so I’m happy trying to help you here

patrickvonplaten · June 30, 2021, 12:33pm

Would be awesome if we manage to find other people to join this project - otherwise It’ll be use two

patrickvonplaten · June 30, 2021, 12:37pm

I think, first we should decide on a model architecture. I would suggest either BERT or GPT2. If we stick to BERT we should also try to find some good downstream data to fine-tune the model on

patrickvonplaten · June 30, 2021, 12:38pm

And it would be great to find some good datasets in Swahili as well

patrickvonplaten · June 30, 2021, 12:38pm

Feel free to also open a discord, we can chat there for more details

alokmatta · June 30, 2021, 1:52pm

will continue to add them here - Flax Swahili Pretraining - Google Sheets

patrickvonplaten · June 30, 2021, 7:42pm

Awesome added you!

Topic		Replies	Views
PreTrain GPT-2 from scratch for German on novel GC4 dataset Flax/JAX Projects	7	1201	July 2, 2021
PreTrain RoBERTa from scratch in Portuguese Flax/JAX Projects	16	2425	October 4, 2021
PreTrain T5 for Italian 🇮🇹 Flax/JAX Projects	3	618	July 7, 2021
PreTrain GPT2 from scratch in Indonesia Flax/JAX Projects	13	761	June 30, 2021
Pretrain GPT-2 from scratch in Mongolian Flax/JAX Projects	3	959	July 2, 2021

PreTrain Swahili Flax model

Related topics