Fine-tuning seq2seq: Helsinki-NLP

rgwatwormhill · October 30, 2020, 3:02pm

Hi, I’ve not tried seq-to-seq (I’ve been using BERT), and I’m not an expert, but I have a few suggestions.

I suggest you don’t train from scatch. Brazilian Portugese should be very close to standard Portugese, and Catalonian Spanish is probably quite close to standard Spansih. Much closer than randomly-initialized weights would be.

I suggest you start by fine-tuning with a much smaller sample of data, so that you can find out where the problems are, and get some suitable hyperparameters.

What do you suppose is happening at 17 hours and 35 hours? Is someone else sharing your system?

If you want to train a bit and then stop, and restart from the same place, you can save the model state-dict and the optimizer state-dict.

I suggest you run the validation less frequently.

Topic		Replies	Views
mBART finetuning tips/post-mortem 🤗Transformers	6	2636	November 17, 2020
Why does translation quality go down after fine-tuning only one epoch? 🤗Transformers	3	408	September 24, 2021
Seq2SeqTrainer Questions 🤗Transformers	12	5262	August 18, 2022
How to a Autotrain Seq2Seq? 🤗AutoTrain	6	528	April 13, 2024
Fine-tune, or train from scratch? Beginners	6	3451	September 16, 2020

Fine-tuning seq2seq: Helsinki-NLP

Related topics