How to translate Tweets Sentiment Extraction dataset


I want to create a French version of Manuel Romero’s mrm8488/t5-base-finetuned-span-sentiment-extraction, using the Google colab notebook by Lorenzo Ampil. But to do this I need to translate the whole Tweets Sentiment Extraction dataset to French. I have tried using Helsinki-NLP/opus-mt-fr-en as my translation model, but it keeps giving errors. also, the estimated time to complete the translation of the text and selected_text columns is around 20h!

Is there a “simpler” way to translate English datasets to other languages like French

Maybe ?

1 Like

@savasy thanks for the help. I had tried the Opus models and the mBart and I have to admit I wasn’t so happy with the translation quality of the NL translation. I finally ended up using the GoogleTranslate API, and was very happy with the results:
repo: GitHub - ssut/py-googletrans: (unofficial) Googletrans: Free and Unlimited Google translate API for Python. Translates totally free of charge.
notebook: Google Colab