Sentiment analysis of Sinhala language using deep learning networks

Sentiment analysis of Sinhala language using deep learning networks

1. Description

The main objective of the project is to test deep learning models to identify the sentiments in Sinhala text. A Facebook dataset is used to train and test the models. The model codes are already developed and only the training and testing phases remain to be done. Since Sinhala remains as a resource poor NLP language, this project will lend a hand to improve the current tools and provide insight on the current state.
Migrating the current code into JAX with the use of Flax, Haiku and other libraries is another objective. Libraries like Trax with basic deep learning models and the trending Transformers are aimed to be tested.

2. Language

The models are trained in Sinhala Language

3. Model

The models that will be tested are

  • RNN
  • LSTM
  • GRU
  • BiLSTM
  • Baseline models with the combination of a CNN
  • Stacked LSTM and BiLSTM
  • HAHNN
  • Capsule networks

4. Dataset

A Facebook dataset contaning 526,732 Sinhala and English posts extracted from CrowdTangle . The dataset consists of a decade’s worth of content from Facebook pages popular in Sri Lanka.

5. Training scripts

The following links contain the model scripts

Main models

6. Challenges

There are several models that needs to be adjusted and tested

7. Desired project outcome

Performance measures of each model

8. Reads

The following links can be useful to better understand the project and
what has been done previously.

https://sencat.lk/

If the proposal is good enough can you please accept this project? @Suzana @valhalla @osanseviero @patrickvonplaten

1 Like

Thanks for the cool proposal @graw !

The project is really cool :slight_smile: Just regarding the models, we don’t really have any of those implemented in Transformers, so this might take some time…Would it be sensible to pretrain a RoBERTa model + finetune it afterwards maybe?

Given that the project lasts only a week, maybe implementing + trying out all those models is a bit time-consuming

1 Like

puttting you guys down though officially :slight_smile:

1 Like

Also do you have links to the dataset? :slight_smile:

1 Like

Thank you so much, you are a life savior. It is fine about the Transformer part. I will work on it afterwards. The database is not technically available for the public because of the new Facebook regulations. I am able to provide it to my teammates but not make it public. However I can add the paper regarding the dataset and if you are interested you can ask from the original authors for access.
Dataset paper

1 Like