Training from scratch without any pre-trained MLM model

reddynt · August 16, 2023, 7:48am

Hi Team

Can I train a model from scratch without using a pre-trained model(MLM)? What results can I expect?

I have corpus of 50000 document image data and I am trying to train a multi-model token classification model. (Lilt, LayoutLM)

Should I pre-train a MLM model and finetune with same data corpus (or)
should I directly train model from scratch with Token classification head attached.

I cannot download a pre-trained model because of organization policy, So I want to get good results with train from scratch approach.

Topic		Replies	Views
Fine-tune model for domain or create language model from scratch Beginners	0	656	May 2, 2022
Best solution for train tokenizer and MLM from scratch 🤗Tokenizers	0	729	December 6, 2021
Pretraining T5 from scratch using MLM Models	1	394	December 6, 2024
Train from scratch vs further pretraining/fine tuning with MLM and NSP Research	1	1546	August 28, 2023
Train MLM on my own domain and fine tune on downstream classification task Intermediate	3	1016	April 16, 2024