What architecture to use to classify english to japanese translations

Hi everyone,

i am currently diving into Transformers and the huggingface ecosystem. As an interesting test project i want to build a classifier that given an english and japanese sentence, it can decide if the english sentence is a translation of the japanese sentence (or vise versa).

I already tried a few things like using lora to finetune llama models, or training a llama model (smaller configuration) from scratch.

Currently i tried out an encoder decoder architecture with a pretrained roberta pretrained as the encoder and a newly initilized decoder roberta. I freeze the encoder and only train the decoder.

Not one of these strategies has given me any acceptable success. The encoder-decoder was the weirdest on, not really training at all currently. The loss looks like this:

and only after i am training for multiple hundred epochs on a small subset of the dataset. If i use smaller epoch sizes I do not see any actual trianing happening.

I am kinda out of ideas at this point. Is there something i am completly missing or forgetting?

What architectures would be recommended for this kind of task?

Also is transformers simply the wrong tool here? Are there other nn’s or tools i could use instead to solve this problem?

Thank you to everyone that can help me with this one, or point me in the right direction ^^

Hey @Nazzaroth2

Classification tasks for Japanese/English (and vice versa) are a very tricky project to start due to the specificities of the Japanese language. You might be interested in reading the following links.

Good luck!

Akim

Thank you for the links, they look very interesting for background information and experiment options. ^^ I have made some forward steps with encoders only now, after rereading how transformers actually work, so there might be some light at the end of the tunnel for this project afterall XD