Error in fine-tuning BERT

lewtun · February 1, 2021, 8:36pm

Hi @AlanFeder, judging by the stack trace my first guess is that the problem comes from a conflict between padding in the dataset.map operation vs padding on-the-fly in the Trainer.

As described in the Trainer docs, when you pass the tokenizer to the Trainer it will be used as follows:

The tokenizer used to preprocess the data. If provided, will be used to automatically pad the inputs the maximum length when batching inputs, and it will be saved along the model to make it easier to rerun an interrupted training or reuse the fine-tuned model.

So it seems that in your code, you’re doing padding twice: once in dataset.map and then again during training.

Can you remove the padding=True argument from your tokenization step and see if that works?

Topic		Replies	Views
Not able to predict using Transformers Trainer class Intermediate	2	182	October 2, 2024
BERT finetuning "index out of range in self" Intermediate	2	4123	August 24, 2021
IndexError: list index out of range, when trying to predict from the fine tuned model Beginners	0	106	July 20, 2024
Failing at finetuning BERT for a NER task Beginners	2	205	May 8, 2024
Text classifier is trained incorrectly using BERT transformers (f1 = 0) for a certain amount of dataset 🤗Transformers	2	831	August 31, 2023

Error in fine-tuning BERT

Related topics