How to check Default data split ratio for RobertaForMaskedLM?

harshsharma278 · August 23, 2022, 10:36pm

Hey there.
I am performing a pre-training on RoBERTa for masked language modelling and have performed tokenizer from ByteLevelBPETokenizer.

I have got merges.txt and vocab.json and have provided its path to RobertaForMaskedLM in line by line dataset.

My question is - how to know the default split ratio while performing pre-training using Trainer - is there any way to change the data split ratio for train, test and valid ??

Because I am getting the Loss, epochs and and learning rate in the output , but I haven’t provided any particular split ratio explicitly.

Thanks

Topic		Replies	Views
Further pre-training the tokenizer? 🤗Tokenizers	0	821	April 30, 2022
Pre-Training From Scratch 🤗Transformers	0	1003	October 6, 2021
Further pre-train roberta model Beginners	1	1390	July 14, 2020
Pretraining RoBERTa from scratch breaks down when using tokenizer with smaller vocabulary Beginners	2	1677	March 7, 2021
[URGENT] Issues with Training RoBERTa Model for Text Prediction with Fill Mask Task 🤗Transformers	6	216	March 19, 2024

How to check Default data split ratio for RobertaForMaskedLM?

Related topics