Hi all,
I’m currently trying to fine-tune xlm-roberta-base for a binary classification task, using a pretty standard code:
[...]
data_files = {'train': train_path, 'test': test_path}
model_name = 'xlm-roberta-base'
tokenizer = AutoTokenizer.from_pretrained(model_name)
train_dataset = create_dataset(data_files['train'], tokenizer, shuffle=True) #just load dataset from file and tokenizes text
test_dataset = create_dataset(data_files['test'], tokenizer)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
training_args = TrainingArguments(output_dir=output_dir, num_train_epochs=5,
per_device_train_batch_size=16, per_device_eval_batch_size=16, data_seed=42,
logging_dir='logs', logging_strategy='epoch', save_strategy='no',
evaluation_strategy='epoch')
trainer = Trainer(model=model, args=training_args, train_dataset=train_dataset,
compute_metrics=compute_metrics, eval_dataset=test_dataset)
trainer.train()
trainer.save_model()
[...]
I’m using a balanced dataset with 50% examples of class 0 and 50% of class 1. On the evaluation set, the model always predict one class. If I change model to another, for example “bert-base-cased” it reaches an accuracy of 92%. What am I missing?
Thanks in advace!