Hey, I have a custom dataset. can you send a sample script to get the accuracy on such a dataset? I was going through examples and I couldn’t get a code that does that. Can someone send me a resource?
my dataset is of the format-
premise , hypothesis, label(0 or 1)
and my model is deberta
Hey @NDugar, the simplest way to do this would be with the accuracy metric in the datasets library, e.g.
from datasets import load_metric
# You might need to install scikit-learn
metric = load_metric("accuracy")
results = metric.compute(references=[0, 1], predictions=[0, 1])
results
# {'accuracy': 1.0}
As you can see in this example, the main bit of work you need to do is create the arrays for the references (i.e. ground truth labels) and predictions (predicted labels). If you’re using the Trainer you can get these arrays easily by calling Trainer.predict(test_dataset)
The following columns in the evaluation set don't have a corresponding arg
ument in `DebertaV2ForSequenceClassification.forward` and have been ignored
: premise, id, hypothesis.
***** Running Evaluation *****
Num examples = 187
Batch size = 4
100%|██████████████████████████████████████| 47/47 [00:03<00:00, 13.15it/s]
Hey,I was able to run the evaluation in a different way but I am not getting the results in the end or in the logs. How do I get the result? Also it seems to ignore the key columns what do I do? @lewtun
Hey @NDugar if you’re using the Trainer my suggestion would be to run Trainer.predict(your_test_dataset) so you can get all the predictions. Then you should be able to feed those into the accuracy metric in a second step (or whatever metric you’re interested in).
If you’re still having trouble, I suggest providing a minimal reproducible example, as explained here
oh um small question what is the difference between Trainer.predict(your_test_dataset) and Trainer.evaluate(your_test_dataset) ? I think that predict might solve my problem. i was using evaluate and getting errors previously.
Trainer.predict() will return the model’s predicted labels for your dataset, while Trainer.evaluate() will go one step further and compute the loss + anything else you’ve defined in the compute_metrics() function