Sentiment analysis of Sinhala language using deep learning networks

graw · June 30, 2021, 4:29pm

Sentiment analysis of Sinhala language using deep learning networks

1. Description

The main objective of the project is to test deep learning models to identify the sentiments in Sinhala text. A Facebook dataset is used to train and test the models. The model codes are already developed and only the training and testing phases remain to be done. Since Sinhala remains as a resource poor NLP language, this project will lend a hand to improve the current tools and provide insight on the current state.
Migrating the current code into JAX with the use of Flax, Haiku and other libraries is another objective. Libraries like Trax with basic deep learning models and the trending Transformers are aimed to be tested.

2. Language

The models are trained in Sinhala Language

3. Model

The models that will be tested are

RNN
LSTM
GRU
BiLSTM
Baseline models with the combination of a CNN
Stacked LSTM and BiLSTM
HAHNN
Capsule networks

4. Dataset

A Facebook dataset contaning 526,732 Sinhala and English posts extracted from CrowdTangle . The dataset consists of a decade’s worth of content from Facebook pages popular in Sri Lanka.

5. Training scripts

The following links contain the model scripts

Main models

6. Challenges

There are several models that needs to be adjusted and tested

7. Desired project outcome

Performance measures of each model

8. Reads

The following links can be useful to better understand the project and
what has been done previously.

https://sencat.lk/

graw · June 30, 2021, 5:55pm

If the proposal is good enough can you please accept this project? @Suzana @valhalla @osanseviero @patrickvonplaten

patrickvonplaten · July 1, 2021, 10:12am

Thanks for the cool proposal @graw !

The project is really cool Just regarding the models, we don’t really have any of those implemented in Transformers, so this might take some time…Would it be sensible to pretrain a RoBERTa model + finetune it afterwards maybe?

Given that the project lasts only a week, maybe implementing + trying out all those models is a bit time-consuming

patrickvonplaten · July 1, 2021, 10:12am

puttting you guys down though officially

patrickvonplaten · July 1, 2021, 10:13am

Also do you have links to the dataset?

graw · July 1, 2021, 11:34am

Thank you so much, you are a life savior. It is fine about the Transformer part. I will work on it afterwards. The database is not technically available for the public because of the new Facebook regulations. I am able to provide it to my teammates but not make it public. However I can add the paper regarding the dataset and if you are interested you can ask from the original authors for access.
Dataset paper

Topic		Replies	Views
Pretrained GPT2 for Tamil Flax/JAX Projects	13	1086	July 12, 2021
Pretrain GPT-2 from scratch in Thai Flax/JAX Projects	0	921	July 18, 2021
PreTrain RoBERTa for Kannada Flax/JAX Projects	3	408	July 2, 2021
Super Beginner to NLP. I am not sure if what i did is correct. Please help Beginners	0	331	April 13, 2023
NLP in Sinhala language Languages at Hugging Face	1	743	April 28, 2023

Sentiment analysis of Sinhala language using deep learning networks