Train bert from scratch using run_mlm.py

hxz116 · March 25, 2022, 3:18am

following the huggingface example I ran:

python run_mlm.py
–model_type bert
–tokenizer_name roberta-base
–dataset_name wikitext
–dataset_config_name wikitext-2-raw-v1
–output_dir /tmp/mlm-full/

However the above script finished pre-maturely without training:
Dropping the following result as it does not have all the necessary fields:
{‘task’: {‘name’: ‘Masked Language Modeling’, ‘type’: ‘fill-mask’}, ‘dataset’: {‘name’: ‘wikitext wikitext-2-raw-v1’, ‘type’: ‘wikitext’, ‘args’: ‘wikitext-2-raw-v1’}}

how do i do a train from scratch? any arguments i missed? I am a beginner.

Topic		Replies	Views
Fine tune Masked Language Model on custom dataset Beginners	5	6109	August 20, 2020
How to train from scratch with run_mlm.py, .txt file? Beginners	20	6852	September 22, 2024
Incremental Training using run_mlm.py 🤗Transformers	0	307	December 12, 2022
Training RoBERTa from scratch: error? 🤗Transformers	0	596	August 26, 2021
Fine-tune BERT for Masked Language Modeling 🤗Transformers	3	3051	January 25, 2021

Train bert from scratch using run_mlm.py

Related topics