RoBERTa for Kannada
Currently, there are only two models available for hate speech detection in the hugging face model hub. By pre-training a RoBERTa model, we wish to increase the accessibility to one of the oldest Indic languages.
2. Language
Kannada
3. Model
A randomly Initialized RoBERTa model.
4. Datasets
Here are some of the datasets containing Kannada sentences: preprocessing required.
5. Training scripts
A masked language modeling script for Flax is available here. Probably the same can be used.
6. Desired project outcome
To use this model and fine-tune it for a sentiment analysis task for Kannada text sentences.