Hi everyone,
i am currently diving into Transformers and the huggingface ecosystem. As an interesting test project i want to build a classifier that given an english and japanese sentence, it can decide if the english sentence is a translation of the japanese sentence (or vise versa).
I already tried a few things like using lora to finetune llama models, or training a llama model (smaller configuration) from scratch.
Currently i tried out an encoder decoder architecture with a pretrained roberta pretrained as the encoder and a newly initilized decoder roberta. I freeze the encoder and only train the decoder.
Not one of these strategies has given me any acceptable success. The encoder-decoder was the weirdest on, not really training at all currently. The loss looks like this:
and only after i am training for multiple hundred epochs on a small subset of the dataset. If i use smaller epoch sizes I do not see any actual trianing happening.
I am kinda out of ideas at this point. Is there something i am completly missing or forgetting?
What architectures would be recommended for this kind of task?
Also is transformers simply the wrong tool here? Are there other nn’s or tools i could use instead to solve this problem?
Thank you to everyone that can help me with this one, or point me in the right direction ^^