Train bert from scratch using run_mlm.py

following the huggingface example I ran:

python run_mlm.py
–model_type bert
–tokenizer_name roberta-base
–dataset_name wikitext
–dataset_config_name wikitext-2-raw-v1
–output_dir /tmp/mlm-full/

However the above script finished pre-maturely without training:
Dropping the following result as it does not have all the necessary fields:
{‘task’: {‘name’: ‘Masked Language Modeling’, ‘type’: ‘fill-mask’}, ‘dataset’: {‘name’: ‘wikitext wikitext-2-raw-v1’, ‘type’: ‘wikitext’, ‘args’: ‘wikitext-2-raw-v1’}}

how do i do a train from scratch? any arguments i missed? I am a beginner.

1 Like