Pre-Train BERT (from scratch)

I don’t yet. I am still setting up these training pipelines. I asked about metrics at Evaluation metrics for BERT-like LMs but no response yet. I read at https://huggingface.co/transformers/perplexity.html and elsewhere that perplexity is not appropriate for BERT and MLMs. Can’t we use fill-mask pipeline and some version of masking accuracy?

OTOH, I’ve already setup GLUE benchmarks with https://jiant.info/ v2 Alpha. Excellent integration with transformers and can easily plugin any model and run benchmarks in parallel. See https://github.com/jiant-dev/jiant/tree/master/examples for more details