I mentioned before that I got it to work, but it seems that while inference works perfectly, training doesn’t. When I try to train the model I get completely garbage results and I can’t really tell why. The training loss and validation loss are extremely small (around 2e-3) but the ROUGE scores I’m calculating are also abysmal (approx 2e-4). Lastly, training the model for one epoch makes it completely forget how to translate between the two languages I have it pruned for.