How to get accuracy of pre trained model in huggingface?

I want to use a pretrained model in hugging face hub for predict my own dataset (not fine tuning only predict using pipeline). Like this model didn’t provide the f1 score. For example

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("cahya/xlm-roberta-large-indonesian-NER")

model = AutoModelForTokenClassification.from_pretrained("cahya/xlm-roberta-large-indonesian-NER")

How can we get the accuracy of the model itself since I need to make sure the model get a score in my prediction. I check huggingface doc didn’t get the function for call accuracy of pre trained model. Thanks

# In tensorflow we can do this
tf.keras.metrics.Accuracy()
1 Like

Hello @dwisaji!

If you’re trying to find the F1 score on the evaluation split from the training process, unfortunately I think you’ll have to reach out to the model author (cahya (Cahya Wirawan), cahya-wirawan (Cahya Wirawan) · GitHub) directly to get that from them. You’d need the evaluation set, which doesn’t look like it’s linked in the docs. They have the option of including it in the model card, but the Hub doesn’t force them to do so, and it doesn’t get automatically wrapped up in the model.

If instead you’re trying to calculate the F1 score with your own labelled dataset you can either use AutoEvaluate’s Model Evaluator or the HF Evaluate library.

Let me know if I misunderstood the question!

Hi I got a little bit confused when using Evaluator , for example I using dataset dataset indonluat their “smsa” data their label is {"positive": 0, "neutral": 1,"negative": 2} but when I write it on evaluator it got an error of


KeyError                                  Traceback (most recent call last)
Cell In [5], line 1
----> 1 eval_results = task_evaluator.compute(
      2     model_or_pipeline="mdhugol/indonesia-bert-sentiment-classification",
      3     data=data,
      4     label_mapping={"positive": 0, "neutral": 1,"negative": 2}
      5 )

File c:\users\w i n d o w s\pycharmprojects\pythonproject\venv\lib\site-packages\evaluate\evaluator\text_classification.py:105, in TextClassificationEvaluator.compute(self, *args, **kwargs)
     41 def compute(self, *args, **kwargs) -> Tuple[Dict[str, float], Any]:
     42     """
     43     Compute the metric for a given pipeline and dataset combination.
     44     Args:
   (...)
    102     >>> )
    103     ```"""
--> 105     result = super().compute(*args, **kwargs)
    107     return result

File c:\users\w i n d o w s\pycharmprojects\pythonproject\venv\lib\site-packages\evaluate\evaluator\base.py:200, in Evaluator.compute(self, model_or_pipeline, data, metric, tokenizer, feature_extractor, strategy, confidence_level, n_resamples, device, random_state, input_column, label_column, label_mapping)
    198 # Compute predictions
    199 predictions, perf_results = self.call_pipeline(pipe, pipe_inputs)
--> 200 predictions = self.predictions_processor(predictions, label_mapping)
    202 metric_inputs.update(predictions)
    204 # Compute metrics from references and predictions

File c:\users\w i n d o w s\pycharmprojects\pythonproject\venv\lib\site-packages\evaluate\evaluator\text_classification.py:35, in TextClassificationEvaluator.predictions_processor(self, predictions, label_mapping)
     34 def predictions_processor(self, predictions, label_mapping):
---> 35     predictions = [
     36         label_mapping[element["label"]] if label_mapping is not None else element["label"]
     37         for element in predictions
     38     ]
     39     return {"predictions": predictions}

File c:\users\w i n d o w s\pycharmprojects\pythonproject\venv\lib\site-packages\evaluate\evaluator\text_classification.py:36, in <listcomp>(.0)
     34 def predictions_processor(self, predictions, label_mapping):
     35     predictions = [
---> 36         label_mapping[element["label"]] if label_mapping is not None else element["label"]
     37         for element in predictions
     38     ]
     39     return {"predictions": predictions}

KeyError: 'LABEL_1'

But When I change the value of label mapping as the docs at text classification say it didn’t work fine. Why the model cannot take the label mapping from dataset?

from datasets import load_dataset
from evaluate import evaluator
from transformers import AutoModelForSequenceClassification, pipeline

data = load_dataset("indonlu","smsa", split="test").shuffle(seed=42).select(range(500))
task_evaluator = evaluator("text-classification")

eval_results = task_evaluator.compute(
    model_or_pipeline="mdhugol/indonesia-bert-sentiment-classification",
    data=data,
  #this also failed
label_mapping={'LABEL_0': 'positive', 'LABEL_1': 'neutral', 'LABEL_2': 'negative'}
    #this also did't work
    label_mapping={"NEGATIVE": 0, "POSITIVE": 1}
    #this throw an error even same as the dataset label
    #label_mapping={"positive": 0, "neutral": 1,"negative": 2} 
)

I also cannot use Model Evaluator - a Hugging Face Space by autoevaluate because the model of I desire to check the score is not in the option