hi @joval
There HF docs show BertForMaskedLM parameter and output.
you can train BERT MLM from scratch with that class.
Thanks for nielsr, there some good tutorial of fine tuning BERT with HF.
It will be help to you underestand whole train structure.
regards.