PreTrain RoBERTa for Kannada

SriharshaHatwar · July 1, 2021, 9:39pm

RoBERTa for Kannada

Currently, there are only two models available for hate speech detection in the hugging face model hub. By pre-training a RoBERTa model, we wish to increase the accessibility to one of the oldest Indic languages.

2. Language

Kannada

3. Model

A randomly Initialized RoBERTa model.

4. Datasets

Here are some of the datasets containing Kannada sentences: preprocessing required.

Automate Text-based Workflows at Scale

5. Training scripts

A masked language modeling script for Flax is available here. Probably the same can be used.

6. Desired project outcome

To use this model and fine-tune it for a sentiment analysis task for Kannada text sentences.

SriharshaHatwar · July 1, 2021, 9:41pm

More datasets containing Kannada sentences :

SriharshaHatwar · July 1, 2021, 9:42pm

Some more datasets :

patrickvonplaten · July 2, 2021, 3:34pm

Let’s define it!

Topic		Replies	Views
Pre-train RoBERTa from Scratch for Georgian Language Flax/JAX Projects	1	1265	July 7, 2021
PreTrain RoBERTa from scratch in Portuguese Flax/JAX Projects	16	2431	October 4, 2021
PreTrain RoBERTa from scratch in Hindi Flax/JAX Projects	24	2043	December 10, 2021
PreTrain RoBERTa from scratch in Marathi Flax/JAX Projects	7	921	July 7, 2021
PreTrain RoBERTa from scratch in Thai Flax/JAX Projects	3	647	July 2, 2021