Hello together,
I got a labeled dataset with 5 different languages (en, hi, ta, ml, bn) and want to to a text classification task with the aim to predict if a text is a claim or not.
One first approach for me would be to try out, if dublicating the training data by translating it into another language would give me better results for my model (this would be helpful for resource poor countries).
As I am a beginner I wanted to ask how experienced developers would approach this task/ which multilingual model to use to train the model with at least two different languages/ or what would be the general steps to solve my approach?
Thanks in advance for your suggestions!
Kind regards