Training BERT model from scratch with custom sequence

latent-observation · September 21, 2022, 10:37pm

Hello,

I have a text file (separated by whitespace) containing sequences of strings (they are not words, rather the data belongs to a different domain).
I wanted to try pre-training a BERT model with this data.

I have seen tutorials with folks fine-tuning on different target datasets but I don’t think there is a official tutorial for pretraining with custom data. I think I need to index the data sequences first and then do the pre-training.

Are there any resources that could help me with this? Any pointers would be much appreciated.

Topic		Replies	Views
Tutorial on Pretraining BERT Beginners	1	538	December 15, 2020
Pretrain bert model using BOTH bert data custom data Beginners	0	353	April 28, 2022
Pre-training BERT Models	1	383	May 21, 2024
Pre-Train BERT from scratch 🤗Transformers	5	15504	May 30, 2023
Further Pretrain Basic BERT for sequence classification 🤗Transformers	4	1807	October 9, 2020

Training BERT model from scratch with custom sequence

Related topics