yl-to
1
My abert pre-train from scratch model canât converge to 0 in even using wiki-Text2.
- The model training loss converged at 6.6 when using AlbertForMaskedLM as model class
- negative training loss when using AlbertForPretrain as model class
notice: I was deliberately set the eval dataset the same as training set for checking training loss at last run.
I also raised a issue here:
sxdmit
2
i believe it might due to your inappropriate setting on lr or the size of your dataset, you should take a careful look into your training
1 Like