This is an initial list for Arabic
Dataset | Paper | Link |
---|---|---|
SOQAL | Neural Arabic Question Answering | https://github.com/husseinmozannar/SOQAL |
HARD | Hotel Arabic-Reviews Dataset Construction for Sentiment Analysis Applications | https://github.com/elnagara/HARD-Arabic-Dataset |
ArsentD-LEV | A Multi-Topic Corpus for Target-based Sentiment Analysis in Arabic Levantine Tweets | http://oma-project.com/ArSenL/ArSenTD_Lev_Intro |
ANERcorp | ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy | http://curtis.ml.cmu.edu/w/courses/index.php/ANERcorp |
LABR | A Large-SCale Arabic Book Reviews Dataset | https://github.com/mohamedadaly/LABR |
AJGT | N/A | https://github.com/komari6/Arabic-twitter-corpus-AJGT |
Multi-datasets | Building Large Arabic Multi-domain Resources for Sentiment Analysis | https://github.com/hadyelsahar/large-arabic-sentiment-analysis-resouces |
TEAD | Using Tweets and Emojis to Build TEAD: an Arabic Dataset for Sentiment Analysis | https://github.com/HSMAabdellaoui/TEAD |
COVID-19 dataset | Large Arabic Twitter Dataset on COVID-19 | https://github.com/SarahAlqurashi/COVID-19-Arabic-Tweets-Dataset |