Hi, I’ve not tried seq-to-seq (I’ve been using BERT), and I’m not an expert, but I have a few suggestions.
I suggest you don’t train from scatch. Brazilian Portugese should be very close to standard Portugese, and Catalonian Spanish is probably quite close to standard Spansih. Much closer than randomly-initialized weights would be.
I suggest you start by fine-tuning with a much smaller sample of data, so that you can find out where the problems are, and get some suitable hyperparameters.
What do you suppose is happening at 17 hours and 35 hours? Is someone else sharing your system?
If you want to train a bit and then stop, and restart from the same place, you can save the model state-dict and the optimizer state-dict.
I suggest you run the validation less frequently.