Sample evaluation script on custom dataset

NDugar · December 9, 2021, 8:22am

Hey, I have a custom dataset. can you send a sample script to get the accuracy on such a dataset? I was going through examples and I couldn’t get a code that does that. Can someone send me a resource?

my dataset is of the format-
premise , hypothesis, label(0 or 1)
and my model is deberta

Thanks

@lewtun

NDugar · December 9, 2021, 9:10am

side note: i am using a zero-shot version of deberta- NDugar/ZSD-microsoft-v2xxlmnli · Hugging Face

lewtun · December 9, 2021, 1:20pm

Hey @NDugar, the simplest way to do this would be with the accuracy metric in the datasets library, e.g.

from datasets import load_metric

# You might need to install scikit-learn
metric = load_metric("accuracy")
results = metric.compute(references=[0, 1], predictions=[0, 1])
results
# {'accuracy': 1.0}

As you can see in this example, the main bit of work you need to do is create the arrays for the references (i.e. ground truth labels) and predictions (predicted labels). If you’re using the Trainer you can get these arrays easily by calling Trainer.predict(test_dataset)

NDugar · December 9, 2021, 1:56pm

In the sample code you wrote where do I mention the model and the dataset?

lewtun · December 9, 2021, 2:11pm

If you’re after an end-to-end example, you could try adapting the official text classification example here

NDugar · December 9, 2021, 2:13pm

ok thank you.

NDugar · December 9, 2021, 5:32pm

Hey, I don’t think this code is useful for evaluation. is it possible you have another code for this purpose?

NDugar · December 10, 2021, 8:19pm

The following columns in the evaluation set  don't have a corresponding arg
ument in `DebertaV2ForSequenceClassification.forward` and have been ignored
: premise, id, hypothesis.
***** Running Evaluation *****
  Num examples = 187
  Batch size = 4
100%|██████████████████████████████████████| 47/47 [00:03<00:00, 13.15it/s]

Hey,I was able to run the evaluation in a different way but I am not getting the results in the end or in the logs. How do I get the result? Also it seems to ignore the key columns what do I do? @lewtun

lewtun · December 13, 2021, 3:45pm

Hey @NDugar if you’re using the Trainer my suggestion would be to run Trainer.predict(your_test_dataset) so you can get all the predictions. Then you should be able to feed those into the accuracy metric in a second step (or whatever metric you’re interested in).

If you’re still having trouble, I suggest providing a minimal reproducible example, as explained here

NDugar · December 13, 2021, 6:02pm

oh um small question what is the difference between Trainer.predict(your_test_dataset) and Trainer.evaluate(your_test_dataset) ? I think that predict might solve my problem. i was using evaluate and getting errors previously.

lewtun · December 14, 2021, 8:30am

Trainer.predict() will return the model’s predicted labels for your dataset, while Trainer.evaluate() will go one step further and compute the loss + anything else you’ve defined in the compute_metrics() function

Topic		Replies	Views
Logging training accuracy using Trainer class 🤗Transformers	8	10455	December 2, 2021
Log training accuracy using Trainer class Beginners	1	677	December 19, 2021
Calculate precision, recall, f1 score for custom dataset for multiclass classification Beginners	13	8711	June 13, 2024
Trainer.evaluate() vs trainer.predict() 🤗Transformers	6	36461	July 10, 2024
I fine-tuned my model. How can I evaluate accuracy? How can I take test-set metrics and classify a sentence? Beginners	0	793	January 24, 2023

Sample evaluation script on custom dataset

Related topics