I’m currently trying to fine-tune a model for domain-specific translation between English and Japanese. I know Gemma has a 2B model specifically for Japanese, and that Qwen supposedly has good performance. What are the best LLMs for this task? Would a Japanese pretrained 2B model work better than a 7B or 13B base model? Since I’m going to be finetuning but no more pretraining.
Would a Japanese pretrained 2B model work better than a 7B or 13B base model?
The Japanese training version of Gemma 2 2B is a good model, but it is not enough to overcome the size difference. If you need performance, use a model 7B or higher.
What are the best LLMs for this task?
As you mentioned, Qwen 2.5 7B or 14B and its fine-tuned models perform well in Japanese. (The standard version is decent, but it leans toward Chinese and English, so fine-tuning is necessary.)
Qwen 3 also seems promising. Gemma 2 and Gemma 3 are strong in non-English languages like Japanese.
For smaller models, some people occasionally fine-tune Qwen 2.5 0.5B or 1.5B for translation tasks.
In any case, there are many models that can translate fairly accurately, but the writing style and other aspects vary significantly between models. After identifying a few promising models from leaderboards or similar sources, it’s best to test them out using Ollama or similar tools and compare the results.
Hey this is super helpful, thanks so much for the response.
Speaking of which, I just remembered that there was a Qwen trained by a Japanese company.
Since LLM models are generally good at English from the start, it might be faster to search for a well-reviewed model in Japanese. You could also try following models that have been liked by Japanese users. While I can’t guarantee that they will excel at translation tasks, Japanese language ability and English-Japanese translation ability generally correlate.
Another issue is just each model’s vocabulary. Whether it knows specific terms in certain specialized fields, or the variety of expressions and phrasing it uses.
Additionally, models without prohibited terms can be risky for chatbot purposes, but they might be convenient for translation tasks.
Got it, appreciate the suggestions. I hadn’t considered the vocabulary asepct, which is definitely going to be important for specialized terminology. Will test out the different models and share my updates. Thanks again!