Pre-Train BERT (from scratch)

vblagoje · September 25, 2020, 10:39am

I don’t yet. I am still setting up these training pipelines. I asked about metrics at Evaluation metrics for BERT-like LMs but no response yet. I read at https://huggingface.co/transformers/perplexity.html and elsewhere that perplexity is not appropriate for BERT and MLMs. Can’t we use fill-mask pipeline and some version of masking accuracy?

OTOH, I’ve already setup GLUE benchmarks with https://jiant.info/ v2 Alpha. Excellent integration with transformers and can easily plugin any model and run benchmarks in parallel. See https://github.com/jiant-dev/jiant/tree/master/examples for more details

Topic		Replies	Views
How to train BERT from scratch on a new domain for both MLM and NSP? Models	2	2290	February 6, 2021
Pre-Train BERT from scratch 🤗Transformers	5	15356	May 30, 2023
Continual pre-training from an initial checkpoint with MLM and NSP Models	4	4282	September 8, 2021
Original Bert Pretraining Intermediate	0	546	January 10, 2022
BERT Next Sentence Prediction: How to do predictions? Beginners	5	7531	September 29, 2022