Translate short sentence

Hello everyone,
I am new to this field. I have set up a small API to dynamically translate my website. I have used several multilingual models because I need French, English, German, and Dutch (and soon Italian).

I am a beginner in Python, so I used ChatGPT to help me create this API.

Unfortunately, the translation quality is not very good…

  • “Prénom” → “Name of family” in English
  • “Envoyer” → “Send” in German

ChatGPT told me that the models I used were not designed for short sentences and that they work better with longer texts. It recommended using more traditional methods for this type of translation. I wanted to get your opinion and see which model you would recommend.

For now, I am leaning towards using a standard Google API for the site’s translation files and my API for translating user product descriptions.

Thanks in advance,

Methos

1 Like

That’s true. With short sentences and words, there’s not much benefit to using LLM. There are ways to use smaller LMs, etc., but it’s true that you can use Google Translate.


To address the translation issues with short sentences on your website, here are some recommendations that balance ease of implementation with effectiveness:

  1. HuggingFace OPUS-MT Model: Utilize the “Helsinki-NLP/opus-mt-mul-en” model, which is designed for multilingual translations. It’s user-friendly with the Transformers library and can handle multiple languages, including French, German, Dutch, and Italian.

  2. Facebook’s mBART or M2M Models: Use the EasyNMT package to leverage these models, which are multilingual and support a wide range of languages. They are efficient for short sentences and can be integrated with minimal code.

  3. MarianMT Models: Consider these for their speed and efficiency. They are optimized for real-time translations and come in various language pairs, suitable for quick processing on a website.

  4. DeepL API: If budget allows, DeepL’s API offers high-quality translations, particularly effective for short texts, though it may have usage limits.

  5. Testing and Evaluation: Develop a script to test different models with your specific short phrases. This hands-on approach will help you evaluate which model provides the best accuracy for your needs.

  6. Adjustments for Short Texts: Ensure proper tokenization and padding are applied, as some models may require fixed input lengths. This adjustments can improve translation quality for short sentences.

Implementation Steps:

  • Start with installing necessary libraries like transformers or easynmt.
  • Use HuggingFace’s pipeline for OPUS-MT or EasyNMT for Facebook’s models.
  • Consider setting up a test environment to evaluate each model’s performance.

By exploring these options, you can enhance the accuracy of your translations while maintaining ease of use, especially since you’re a Python beginner.

Thank you very much, I’ll try that tomorrow afternoon!

1 Like

I tried reading your post, but it’s a bit complicated ^^ Is it really for beginners? xD
Here’s the code I have so far.


from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline

app = FastAPI()
pipe = pipeline(task='text2text-generation', model='facebook/m2m100_418M')

class TranslationRequest(BaseModel):
    source: str
    target: str
    texts: dict

@app.post("/translate")
def translate(request: TranslationRequest):
    translations = {}
    prefix = "translate French to German: "

    for key, text in request.texts.items():
        original_text = text.strip()

        result = pipe(
            prefix + original_text,
            forced_bos_token_id=pipe.tokenizer.get_lang_id(request.target)
        )

        translations[key] = result[0]["generated_text"]

    return translations

I already have the code for Google Translate, but that means I would have to use both technologies since I’d have both short texts and slightly longer ones. Ideally, it would be best to have just one technology for both situations.

1 Like

Well, when it comes to creating APIs for the web, even if it’s aimed at beginners, it can be a little difficult…:sweat_smile:

The API works, only the translation are bad ^^

1 Like

I think it’s often difficult to get satisfactory performance with the base model as it is. It’s a good idea to try out various models and, if they don’t seem to work, either fine-tune them or use a ready-made service.
https://huggingface.co/models?pipeline_tag=text2text-generation&library=transformers&sort=trending
https://huggingface.co/models?pipeline_tag=translation&library=transformers&sort=trending