Error in fine-tuning BERT

Mo777 · February 16, 2022, 3:53pm

I have a similar problem!
I am working on a summarization model with Bart.
so far I can train the model and it is fine, but when I want to do trainer.evaluate() it returns this warning:
“”"
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify ‘dtype=object’ when creating the ndarray.
result = getattr(asarray(obj), method)(*args, **kwds)
“”"
and this error:
“”"
ValueError : could not broadcast input array from shape (50,32,50265) into shape (50,)
“”""
50 is the number of rows in my evaluation dataset
32 is the max_evaluation_length
50265 is BartForConditionalGeneration(
(model): BartModel(
(shared): Embedding(50265, 768, padding_idx=1)

here is the code:
from transformers import AutoTokenizer
model_checkpoint = “facebook/bart-base”

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

max_input_length = 64
max_target_length = 32

def preprocess_function(example):

model_inputs = tokenizer(example['text'],max_length = max_input_length, padding=True , truncation= True) 

with tokenizer.as_target_tokenizer():

    labels = tokenizer(example['summary'], max_length =max_target_length ,padding=True , truncation = True ) 

model_inputs['labels'] = labels['input_ids']

return model_inputs

tokenized_datasets = dataset.map(preprocess_function, batched= True, remove_columns = [‘text’,‘summary’])

from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained(“facebook/bart-base”)

from transformers import TrainingArguments

training_args = TrainingArguments(“test_trainer”)

from transformers import Trainer

import numpy as np

from datasets import load_metric

#metric = load_metric(“accuracy”)

metric = load_metric(“glue”, “mrpc”)

def compute_metrics(eval_pred):

logits, labels = eval_pred

predictions = np.argmax(logits, axis=-1)

return metric.compute(predictions=predictions, references=labels)

trainer = Trainer(model=model,

        args=training_args,

        train_dataset=full_train_dataset,

        eval_dataset=full_eval_dataset,

        compute_metrics=compute_metrics,

        tokenizer = tokenizer)

then
I train the model and it works fine but trainer.evaluate() returns that error

but it works at the start like this:
***** Running Evaluation *****
Num examples = 50
Batch size = 8

6%|▋ | 12/189 [13:07<3:13:36, 65.63s/it]
33%|███▎ | 62/189 [08:23<17:12, 8.13s/it]

then error!

Topic		Replies	Views
IndexError: list index out of range, when trying to predict from the fine tuned model Beginners	0	100	July 20, 2024
Not able to predict using Transformers Trainer class Intermediate	2	163	October 2, 2024
BERT finetuning "index out of range in self" Intermediate	2	4116	August 24, 2021
BERT Multiclass Sequence Classification Index Error Beginners	4	974	April 13, 2021
Fine-tune transformers for language model Beginners	2	662	August 14, 2022

Error in fine-tuning BERT

Related topics