Best model for translating English to Japanese

I am working on a project to translate text from English to Japanese.
I’ve read about llama being trained on Japanese text but I am not sure if there are any better models out there. Any suggestions please?

2 Likes

I tried the Helsinki-NLP model: Helsinki-NLP/opus-mt-en-jap · Hugging Face.
I tried translating a few words and numbers in English to Japanese and didn’t get a good result. Google translate and Deepl translate gave me correct results. I don’t know why, but the words that I gave as input for translation were pretty simple and doesn’t look like the model needs to have domain knowledge for this simple task. If you come across or have come across a better model for English to Japanese translation, please reply to this thread.

1 Like

The LLM with a parameter count of around 7B to 14B is the easiest to use in my experience. Gemma2, Qwen 2.5 Instruct, and Mistral NEMO are good for Japanese and English.
Of these, Gemma2 has a small (2B) model that has been trained specifically for Japanese, so you might want to try that.
I think that if there were a model specifically for translation, it would be even smaller, but I don’t know much about specialized models.

Edit:
I am a Japanese speaker with an LLM, so please feel free to ask me about anything related to Japanese. There are many (presumably) Japanese people on Hub, but they are often working independently, so it is difficult to find them.
I am also working independently.

Hi John, I am just learning Japanese for fun and I thought about building a pipeline that I could use to practice my speaking, Currently I have a microphone module (from my telephone feeding audio data into a whisper ASR, I was particulary surprised when it spit out kanji and hiragana so I wanted to take it a bit further. I figured WhisperSpeech might be a suitable TTS to train in Japanese, but I know absolute ZERO about doing that. I like this LLM that understands Japanese as well and is not limited to only Latin based tokens. Might you have any tips for me?

1 Like

I don’t often get the chance to work with speech models, but from a quick search, it seems that WhisperSpeech has a good reputation in Japan.
However, I’ve found something even better. It’s as follows. Even among native Japanese speakers, there are not many people who can pronounce it this beautifully.
It’s in the realm of voice actors or announcers… it may even be overkill for learning.:sweat_smile:

1 Like