Pre-Train BERT (from scratch)

Any progress here ? I would be so convenient to train a Bert from scratch using datasets and transformers. Does anyone achieve this with comparable results as original Bert ?

Hi @BramVanroy is there an example for pretraining bert on NSP tasks with dataset.map? Thanks!

Hi @vblagoje , I found the file_path param of TextDatasetForNextSentencePrediction is only one file. Does it mean that I need to convert all datasets into one file when splitting sentences? But this file will be too big.

To chunk the articles you can check https://huggingface.co/docs/datasets/processing.html#augmenting-the-dataset

new link is https://huggingface.co/docs/datasets/process#data-augmentation