How to train TFBertForMaskedLM with TFTrainer

fteufel · October 7, 2020, 12:06pm

I want to do language modeling finetuning of a pretrained Bert on a custom corpus. I want to use TPUs on google cloud, so I want to work with TFTrainer to avoid writing own code and don’t worry about its performance. I can’t find any info on how the masking is actually supposed to be performed here, when using Trainer in Pytorch it seems that DataCollatorForLanguageModeling is taking care of this.

This is how my setup looks like:
tokenizer = BertTokenizer.from_pretrained(args.tokenizer_name)

    ds = load_dataset('text', data_files=[args.train_data_file])
    dataset = ds['train'].map(lambda examples: tokenizer(examples['text']), batched=True)
    dataset.set_format(type='tensorflow', columns=['text'])
    features = {x: dataset[x].to_tensor(default_value=0, shape=[None, tokenizer.max_len]) for x in ['input_ids', 'token_type_ids', 'attention_mask']}
    tfdataset = tf.data.Dataset.from_tensor_slices(features).batch(32)

    trainer = TFTrainer(model, training_args, tfdataset, optimizer = (tfa.optimizers.LAMB, None))

    trainer.train()
    trainer.save_model(args.save_path)

I’m using the datasets library to load a line-by-line txt file and convert it to a tensorflow dataset, this then goes straight to the trainer at the moment. Where and how in this process is the masking supposed to be added?

bengul · February 23, 2022, 2:38am

Hi, Just wanted to know if you were able to train the model using this code snippet?

Topic		Replies	Views
Masked Language Modeling (MLM) using TFBertForMaskedLM (Tensorflow) 🤗Transformers	4	590	January 21, 2021
Fine tune Masked Language Model on custom dataset Beginners	5	6068	August 20, 2020
Using a dataset with already masked tokens Beginners	2	702	February 3, 2021
Fine-tune BERT for Masked Language Modeling 🤗Transformers	3	3026	January 25, 2021
Masking specific token in each input sentence during Masked language modelling 🤗Transformers	0	1046	October 18, 2021

How to train TFBertForMaskedLM with TFTrainer

Related topics