Hello, I’m trying to train a RoBERTa model for sequence classification. Previously, I was able to train it with the “test_trainer” arguments. However, when I would subsequently run trainer.evaluate(), the process would complete and then stall out.
^trainer.evaluate() freezes here
When I tried incorporating the evaluation step into training arguments (evaluation_strategy), I would get the same error – after the first epoch/steps would finish training, the evaluation step would happen and then the whole process would freeze.
^evaluation_strategy in training arguments freezes here
CODE:
training_args = TrainingArguments(
output_dir="./output",
num_train_epochs=3,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
evaluation_strategy="steps",
logging_dir="./output/logs",
logging_strategy="steps",
logging_steps=10,
learning_rate=5e-5,
weight_decay=0.01,
warmup_steps=500,
save_strategy="steps",
load_best_model_at_end=True,
save_total_limit=2,
)
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
metric = Accuracy()
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_data,
eval_dataset=eval_data,
compute_metrics=compute_metrics
)
No error logs, just the progress bar stalling out.
On transformers v4.33.1