Translate short sentence

Methos02 · March 18, 2025, 2:20pm

Hello everyone,
I am new to this field. I have set up a small API to dynamically translate my website. I have used several multilingual models because I need French, English, German, and Dutch (and soon Italian).

I am a beginner in Python, so I used ChatGPT to help me create this API.

Unfortunately, the translation quality is not very good…

“Prénom” → “Name of family” in English
“Envoyer” → “Send” in German

ChatGPT told me that the models I used were not designed for short sentences and that they work better with longer texts. It recommended using more traditional methods for this type of translation. I wanted to get your opinion and see which model you would recommend.

For now, I am leaning towards using a standard Google API for the site’s translation files and my API for translating user product descriptions.

Thanks in advance,

Methos

John6666 · March 18, 2025, 4:36pm

That’s true. With short sentences and words, there’s not much benefit to using LLM. There are ways to use smaller LMs, etc., but it’s true that you can use Google Translate.

To address the translation issues with short sentences on your website, here are some recommendations that balance ease of implementation with effectiveness:

HuggingFace OPUS-MT Model: Utilize the “Helsinki-NLP/opus-mt-mul-en” model, which is designed for multilingual translations. It’s user-friendly with the Transformers library and can handle multiple languages, including French, German, Dutch, and Italian.
Facebook’s mBART or M2M Models: Use the EasyNMT package to leverage these models, which are multilingual and support a wide range of languages. They are efficient for short sentences and can be integrated with minimal code.
MarianMT Models: Consider these for their speed and efficiency. They are optimized for real-time translations and come in various language pairs, suitable for quick processing on a website.
DeepL API: If budget allows, DeepL’s API offers high-quality translations, particularly effective for short texts, though it may have usage limits.
Testing and Evaluation: Develop a script to test different models with your specific short phrases. This hands-on approach will help you evaluate which model provides the best accuracy for your needs.
Adjustments for Short Texts: Ensure proper tokenization and padding are applied, as some models may require fixed input lengths. This adjustments can improve translation quality for short sentences.

Implementation Steps:

Start with installing necessary libraries like transformers or easynmt.
Use HuggingFace’s pipeline for OPUS-MT or EasyNMT for Facebook’s models.
Consider setting up a test environment to evaluate each model’s performance.

By exploring these options, you can enhance the accuracy of your translations while maintaining ease of use, especially since you’re a Python beginner.

Methos02 · March 20, 2025, 2:50pm

Thank you very much, I’ll try that tomorrow afternoon!

Methos02 · March 23, 2025, 11:47am

I tried reading your post, but it’s a bit complicated ^^ Is it really for beginners? xD
Here’s the code I have so far.


from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline

app = FastAPI()
pipe = pipeline(task='text2text-generation', model='facebook/m2m100_418M')

class TranslationRequest(BaseModel):
    source: str
    target: str
    texts: dict

@app.post("/translate")
def translate(request: TranslationRequest):
    translations = {}
    prefix = "translate French to German: "

    for key, text in request.texts.items():
        original_text = text.strip()

        result = pipe(
            prefix + original_text,
            forced_bos_token_id=pipe.tokenizer.get_lang_id(request.target)
        )

        translations[key] = result[0]["generated_text"]

    return translations

I already have the code for Google Translate, but that means I would have to use both technologies since I’d have both short texts and slightly longer ones. Ideally, it would be best to have just one technology for both situations.

John6666 · March 23, 2025, 2:48pm

Well, when it comes to creating APIs for the web, even if it’s aimed at beginners, it can be a little difficult…

Methos02 · March 24, 2025, 9:37am

The API works, only the translation are bad ^^

John6666 · March 24, 2025, 10:06am

I think it’s often difficult to get satisfactory performance with the base model as it is. It’s a good idea to try out various models and, if they don’t seem to work, either fine-tune them or use a ready-made service.
https://huggingface.co/models?pipeline_tag=text2text-generation&library=transformers&sort=trending
https://huggingface.co/models?pipeline_tag=translation&library=transformers&sort=trending

Methos02 · March 28, 2025, 1:49pm

Thx for your answer, I will look ASAP ! I have some bugs to fixe on the main branch ^^’

Topic		Replies	Views
RAG Embeddings: German language Beginners	10	6121	May 23, 2024
Translation model to 100+ Languages Research	4	1507	January 25, 2025
HuggingFace - Why does the T5 model shorten sentences? Models	2	747	April 28, 2024
Tuto on how to train a translation from scratch in a pythonic way? Beginners	2	609	October 23, 2023
Best model for translating English to Japanese Models	7	1983	April 29, 2025

Translate short sentence

Related topics