Looking for (classifier, dataset) pairs across languages (or just classification datasets)

jxm · August 18, 2020, 6:24pm

Hi everyone,

I’m looking for pairs of (transformers model hub) models and their associated (nlp) datasets across languages. The goal is to be able to try text classification in a bunch of different languages, easily.

(model, dataset) pairs
Here are two examples I’ve found so far:

English
- model: textattack/bert-base-uncased-rotten-tomatoes
- dataset: rotten_tomatoes on nlp
French
- model: tblard/tf-allocine
- dataset: allocine on nlp

Multi-lingual classification datasets

Equally good would be link to nlp classification datasets in languages besides French and English. Easiest of all would be a single classification dataset with inputs in many languages (does that exist?). In this case, I could work on training the models myselves (though it’s always nice when they’re trained for you!).

Please let me know! Thanks.

Topic		Replies	Views
Multilingual NLP with BERT Beginners	0	376	December 14, 2021
Request for Further Information on Datasets Beginners	0	280	November 26, 2020
German NLP Repository Languages at Hugging Face	11	4533	November 21, 2023
Searching for a Multilingual Dataset Beginners	0	272	May 11, 2022
Grouphug: multi-task, multi-dataset training with 🤗 transformers/datasets Research	0	2516	June 15, 2022

Looking for (classifier, dataset) pairs across languages (or just classification datasets)

Related topics