Create a detector of toxicity from political tweets in Spain

JaviBJ · November 16, 2021, 8:03pm

Please read the topic category description to understand what this is all about

Description

The goal of this project is to automatically identify toxic speech emitted by politicians on Twitter. It is focused on Spain which is an interesting multilingual case with several co-official languages which are used interchangeably in politics.

Model(s)

Multilingual models like xlm-roberta-base.

Datasets

tweet_eval is a related resource, but it is English-only.

Challenges

Getting high-quality data in Spanish and/or integrating data in other languages.

Desired project outcomes

Create a Streamlit or Gradio app on Spaces that is able to detect toxicity from tweets.

Discord channel

To chat and organise with other people interested in this project, head over to our Discord and:

Follow the instructions on the #join-course channel
Join the #toxic-tweets-es channel

Just make sure you comment here to indicate that you’ll be contributing to this project

MarcBrun · November 16, 2021, 8:17pm

I’d love to work on this project!

saraestevez · November 17, 2021, 7:29am

Hi! I’m also collaborating in this project

Topic		Replies	Views
Build a language detector 🤗 Course Projects	12	2342	January 26, 2022
Sentiment Analysis Portuguese 🤗Transformers	1	1744	April 11, 2022
Best model to classify social media messages in spanish Beginners	0	224	June 27, 2023
Build a Twitter topic extractor 🤗 Course Projects	7	3015	March 7, 2023
Create a multilingual classifier 🤗 Course Projects	3	1514	October 22, 2024