Trainer.evaluate() freezing

Hello, I’m trying to train a RoBERTa model for sequence classification. Previously, I was able to train it with the “test_trainer” arguments. However, when I would subsequently run trainer.evaluate(), the process would complete and then stall out.


^trainer.evaluate() freezes here

When I tried incorporating the evaluation step into training arguments (evaluation_strategy), I would get the same error – after the first epoch/steps would finish training, the evaluation step would happen and then the whole process would freeze.


^evaluation_strategy in training arguments freezes here

CODE:

training_args = TrainingArguments(
    output_dir="./output",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    evaluation_strategy="steps",
    logging_dir="./output/logs",
    logging_strategy="steps",
    logging_steps=10,
    learning_rate=5e-5,
    weight_decay=0.01,
    warmup_steps=500,
    save_strategy="steps",
    load_best_model_at_end=True,
    save_total_limit=2,
)

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

metric = Accuracy()

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
    compute_metrics=compute_metrics
)

No error logs, just the progress bar stalling out.

On transformers v4.33.1

1 Like

one more bump

1 Like

I met the same issue as you and had no idea what was happening. Have you figured out a solution?

1 Like

I’m running into a similar issue, but things typically stop after several training epochs and consistently fail to complete 25 epochs. All of the sudden things just stop progressing and the GPU stops doing any computation, but still has memory reserved; the memory is typically between 4-6GB on a GPU with 24GB available. CPU doesn’t ever show significant memory pressure either and usually has around 32-40GB of RAM available.