Sentiment analysis of Sinhala language using deep learning networks

Sentiment analysis of Sinhala language using deep learning networks

1. Description

The main objective of the project is to test deep learning models to identify the sentiments in Sinhala text. A Facebook dataset is used to train and test the models. The model codes are already developed and only the training and testing phases remain to be done. Since Sinhala remains as a resource poor NLP language, this project will lend a hand to improve the current tools and provide insight on the current state.
Migrating the current code into JAX with the use of Flax, Haiku and other libraries is another objective. Libraries like Trax with basic deep learning models and the trending Transformers are aimed to be tested.

2. Language

The models are trained in Sinhala Language

3. Model

The models that will be tested are

  • RNN
  • LSTM
  • GRU
  • BiLSTM
  • Baseline models with the combination of a CNN
  • Stacked LSTM and BiLSTM
  • HAHNN
  • Capsule networks

4. Dataset

A Facebook dataset contaning 526,732 Sinhala and English posts extracted from CrowdTangle . The dataset consists of a decade’s worth of content from Facebook pages popular in Sri Lanka.

5. Training scripts

The following links contain the model scripts

Main models

6. Challenges

There are several models that needs to be adjusted and tested

7. Desired project outcome

Performance measures of each model

8. Reads

The following links can be useful to better understand the project and
what has been done previously.

https://sencat.lk/