The aim of the project is to improve online safety for vulnerable populations (e.g. adolescents). We would like to train a NLP and CV model to detect hate speech and memes, toxic comments, and potentially identify the presence of online predators as users engage in conversations or browse through a website.
The model will be trained in English.
We will finetune pre-trained GPT2 and ViT models for text and image classification respectively, but are also open to testing other models like ELECTRA and RoBERTa.
Possible links to publicly available datasets include:
- Toxic Comment Classification Challenge | Kaggle
- Hateful Memes Challenge and dataset
We can start with existing Flax scripts for sequence classification (transformers/run_flax_glue.py at master · huggingface/transformers · GitHub) but will likely have to create our own for ViT.
One challenge will be to appropriately identify the intent of a given statement within the broader content of the visited site. For example, the inclusion of profanities can massively impact the evaluation of a sentence even if the overall sentence is meant to convey a positive sentiment.
A second challenge will be the ability of the model to correctly identify forms of toxicity that it has yet to encounter, and also circumvent biases (e.g. ethnicity) that could arise as a result of the data.
(Optional) Desired project outcome
We would like to have a demo that can run in near real-time. Future work can similarly adopt a multimodal approach to tackle videos and other forms of media.
The following links can be useful to better understand the project and what has previously been done.
- Detecting predatory conversations in social media by deep Convolutional Neural Networks: (PDF) Detecting predatory conversations in social media by deep Convolutional Neural Networks
- Contextualizing Hate Speech Classifiers with Post-hoc Explanation: [2005.02439] Contextualizing Hate Speech Classifiers with Post-hoc Explanation
- Detecting Online Hate Speech Using Context Aware Models: https://arxiv.org/pdf/1710.07395.pdf