The goal of this project is to automatically identify toxic speech emitted by politicians on Twitter. It is focused on Spain which is an interesting multilingual case with several co-official languages which are used interchangeably in politics.


Multilingual models like xlm-roberta-base.


  • tweet_eval is a related resource, but it is English-only.


  • Getting high-quality data in Spanish and/or integrating data in other languages.

Desired project outcomes

  • Create a Streamlit or Gradio app on :hugs: Spaces that is able to detect toxicity from tweets.

I’d love to work on this project!


Hi! I’m also collaborating in this project