This is no easy feat, I know it first hand as I am doing something similar with BERT pre-training from scratch. Any reason why you didn’t use HF Trainer?
This is no easy feat, I know it first hand as I am doing something similar with BERT pre-training from scratch. Any reason why you didn’t use HF Trainer?