How to train a LM model with whole word masking using Pytorch Trainer API

pchhapolika · July 4, 2022, 11:43am

I am thinking of fine tuning model by training Language Model from scratch. I have couple of basic questions related to this:

I wanted to use whole-word-masking in training LM from scratch. I could not have found how to apply this option using Trainer.

Here is my data-set and code:

text=['I am huggingface fan', 'I love huggingface', ....]
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)


trainer = tr.Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_data
)

trainer.train()

But it doesn’t take into account whole word masking.

How can I use this function to train LM on whole word masking using Pytorch Trainer?
How can I train on larger sequences which are greater than models max-length using Pytorch Trainer?

Topic		Replies	Views
How to use whole word masking data_collator? Beginners	8	3079	June 15, 2024
Regarding metrics to use in Fine Tuning Masked Language Modeling 🤗Transformers	0	283	August 3, 2022
Fine tune Masked Language Model on custom dataset Beginners	5	6064	August 20, 2020
How to test masked language model after training it? Beginners	9	2052	June 22, 2021
How can I see the masked words during pre-learning by MLM? 🤗Transformers	0	252	February 7, 2022

How to train a LM model with whole word masking using Pytorch Trainer API

Related topics