Hi everyone,
I’m looking for pairs of (transformers model hub) models and their associated (nlp) datasets across languages. The goal is to be able to try text classification in a bunch of different languages, easily.
(model, dataset) pairs
Here are two examples I’ve found so far:
- English
- model: textattack/bert-base-uncased-rotten-tomatoes
- dataset: rotten_tomatoes on nlp
- French
- model: tblard/tf-allocine
- dataset: allocine on nlp
Multi-lingual classification datasets
Equally good would be link to nlp classification datasets in languages besides French and English. Easiest of all would be a single classification dataset with inputs in many languages (does that exist?). In this case, I could work on training the models myselves (though it’s always nice when they’re trained for you!).
Please let me know! Thanks.