Trainer.evaluate() is freezing

Hello, I’m trying to train a RoBERTa model for sequence classification. Previously, I was able to train it with the “test_trainer” arguments. However, when I would subsequently run trainer.evaluate(), the process would complete and then stall out.


^trainer.evaluate() freezes here

When I tried incorporating the evaluation step into training arguments (evaluation_strategy), I would get the same error – after the first epoch/steps would finish training, the evaluation step would happen and then the whole process would freeze.


^evaluation_strategy in training arguments freezes here

CODE:

training_args = TrainingArguments(
    output_dir="./output",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    evaluation_strategy="steps",
    logging_dir="./output/logs",
    logging_strategy="steps",
    logging_steps=10,
    learning_rate=5e-5,
    weight_decay=0.01,
    warmup_steps=500,
    save_strategy="steps",
    load_best_model_at_end=True,
    save_total_limit=2,
)

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

metric = Accuracy()

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
    compute_metrics=compute_metrics
)

No error logs, just the progress bar stalling out.

On transformers v4.33.1

bump on this

another bump

I’m also encountering something similar when following the Fine tune whisper model tutorial.

My code looks something like this

training_args = Seq2SeqTrainingArguments(
    output_dir="./whisper-small-eng-gen",  # change to a repo name of your choice
    per_device_train_batch_size=16,
    gradient_accumulation_steps=1,  # increase by 2x for every 2x decrease in batch size
    learning_rate=1e-5,
    warmup_steps=500,
    max_steps=1000,
    gradient_checkpointing=True,
    fp16=True,
    evaluation_strategy="steps",
    per_device_eval_batch_size=8,
    predict_with_generate=True,
    generation_max_length=225,
    save_steps=1000,
    eval_steps=1000,
    logging_steps=25,
    report_to=["tensorboard"],
    load_best_model_at_end=True,
    metric_for_best_model="wer",
    greater_is_better=False,
    push_to_hub=True,
    ignore_data_skip=True
)

trainer = Seq2SeqTrainer(
    args=training_args,
    model=model,
    train_dataset=common_voice_train,
    eval_dataset=common_voice_test,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    tokenizer=processor.feature_extractor,
)

trainer.train()

Output from the trainer.train():

Dataset is an Iterable Dataset.

Would love to hear inputs and possible fixes on this issue.

Bump. Can never eval if using multi-gpu.

Hi there,

I have met the same issue using the official run_qa.py script provided in transformers 4.40. For me, the problem happened exclusively for my fine-tuned RoBERTa-base. RoBERTa-large and BERT-base-uncased were okay.

My work-around on this is setting the --eval_do_concat_batches to False, which prevent the evaluation loop from saving all logits in memory until evaluating all batches. Since this will breakdown the output.predictions into a list of lists of predictions, you will need to concatenate them before passing it into the metric function. My hack in the trainer_qa.py file is

predictions = [[], [], []]
for t in output.predictions:
    for i, item in enumerate(t):
        predictions[i].append(item)
# save and concat the start and end logits
predictions = [np.concatenate(item) for item in predictions[:2]]

The method prevent my evaluation from running slower and slower after each batch.

1 Like