Sample evaluation script on custom dataset

Hey, I have a custom dataset. can you send a sample script to get the accuracy on such a dataset? I was going through examples and I couldn’t get a code that does that. Can someone send me a resource?

my dataset is of the format-
premise , hypothesis, label(0 or 1)
and my model is deberta



1 Like

side note: i am using a zero-shot version of deberta- NDugar/ZSD-microsoft-v2xxlmnli · Hugging Face

Hey @NDugar, the simplest way to do this would be with the accuracy metric in the datasets library, e.g.

from datasets import load_metric

# You might need to install scikit-learn
metric = load_metric("accuracy")
results = metric.compute(references=[0, 1], predictions=[0, 1])
# {'accuracy': 1.0}

As you can see in this example, the main bit of work you need to do is create the arrays for the references (i.e. ground truth labels) and predictions (predicted labels). If you’re using the Trainer you can get these arrays easily by calling Trainer.predict(test_dataset)

In the sample code you wrote where do I mention the model and the dataset?

If you’re after an end-to-end example, you could try adapting the official text classification example here :slight_smile:

1 Like

ok thank you.

Hey, I don’t think this code is useful for evaluation. is it possible you have another code for this purpose?

The following columns in the evaluation set  don't have a corresponding arg
ument in `DebertaV2ForSequenceClassification.forward` and have been ignored
: premise, id, hypothesis.
***** Running Evaluation *****
  Num examples = 187
  Batch size = 4
100%|██████████████████████████████████████| 47/47 [00:03<00:00, 13.15it/s]

Hey,I was able to run the evaluation in a different way but I am not getting the results in the end or in the logs. How do I get the result? Also it seems to ignore the key columns what do I do? @lewtun

Hey @NDugar if you’re using the Trainer my suggestion would be to run Trainer.predict(your_test_dataset) so you can get all the predictions. Then you should be able to feed those into the accuracy metric in a second step (or whatever metric you’re interested in).

If you’re still having trouble, I suggest providing a minimal reproducible example, as explained here :slight_smile:

oh um small question what is the difference between Trainer.predict(your_test_dataset) and Trainer.evaluate(your_test_dataset) ? I think that predict might solve my problem. i was using evaluate and getting errors previously.

Trainer.predict() will return the model’s predicted labels for your dataset, while Trainer.evaluate() will go one step further and compute the loss + anything else you’ve defined in the compute_metrics() function