What architecture to use to classify english to japanese translations

Nazzaroth2 · July 11, 2023, 8:07pm

Hi everyone,

i am currently diving into Transformers and the huggingface ecosystem. As an interesting test project i want to build a classifier that given an english and japanese sentence, it can decide if the english sentence is a translation of the japanese sentence (or vise versa).

I already tried a few things like using lora to finetune llama models, or training a llama model (smaller configuration) from scratch.

Currently i tried out an encoder decoder architecture with a pretrained roberta pretrained as the encoder and a newly initilized decoder roberta. I freeze the encoder and only train the decoder.

Not one of these strategies has given me any acceptable success. The encoder-decoder was the weirdest on, not really training at all currently. The loss looks like this:

and only after i am training for multiple hundred epochs on a small subset of the dataset. If i use smaller epoch sizes I do not see any actual trianing happening.

I am kinda out of ideas at this point. Is there something i am completly missing or forgetting?

What architectures would be recommended for this kind of task?

Also is transformers simply the wrong tool here? Are there other nn’s or tools i could use instead to solve this problem?

Thank you to everyone that can help me with this one, or point me in the right direction ^^

AkimfromParis · July 13, 2023, 5:46pm

Hey @Nazzaroth2

Classification tasks for Japanese/English (and vice versa) are a very tricky project to start due to the specificities of the Japanese language. You might be interested in reading the following links.

Good luck!

Akim

Nazzaroth2 · July 14, 2023, 5:39am

Thank you for the links, they look very interesting for background information and experiment options. ^^ I have made some forward steps with encoders only now, after rereading how transformers actually work, so there might be some light at the end of the tunnel for this project afterall XD

Topic		Replies	Views
Best model for translating English to Japanese Models	7	2804	April 29, 2025
Separate pre-trained encoder and decoder Models	0	437	October 4, 2023
Japanese NLP - Introductions Languages at Hugging Face	13	4502	April 30, 2024
Multimodal architectures with HuggingFace transformers for speech and text 🤗Transformers	3	1132	November 14, 2022
RAG Embeddings: German language Beginners	10	6623	May 23, 2024

What architecture to use to classify english to japanese translations

Related topics