How to use customized compute_metrics in trainer

I am doing SFT with GSM math data, I want to have the trainer calculate the actual final answer accuracy for me during evaluation on validation dataset. Is there a way to do that with compute_metric function?

Thanks!!

1 Like

Yeah. There is a default compute_metric function, but many people write their own, so I think that’s the safest option.


an example by Hugging Chat: HuggingChat

Yes, you can use the compute_metrics function in Hugging Face’s Trainer to calculate the final answer accuracy for your GSM math data during evaluation on the validation dataset. Here’s how:

  1. Define Your Metrics Function: Create a custom compute_metrics function that takes the predictions and labels (ground truth answers) as inputs. For math datasets like GSM8K, this typically involves comparing the predicted answers to the correct answers.

  2. Extract Numerical Answers: Since the model outputs are likely to be text strings, you may need to parse and extract the numerical answers from the predicted text. For example, if the model outputs “The answer is 42,” you’ll need to extract “42” and compare it to the correct answer.

  3. Calculate Accuracy: Once you have the predicted and correct numerical answers, compute the accuracy by checking how often the predicted answer matches the correct answer.

  4. Return Metrics: Return the computed metrics (e.g., accuracy) from the compute_metrics function. The Trainer will then automatically log these metrics during training and evaluation.

Here’s a simplified example of what the compute_metrics function might look like:

def compute_metrics(eval(predictions, labels):
    # Extract numerical answers from model predictions
    predicted_answers = [extract_number(pred) for pred in
                        predictions.predictions]
    # Compare with correct answers
    correct = sum(1 for pred, label in zip(predicted_answers, labels) if pred == label)
    accuracy = correct / len(labels)
    return {"accuracy": accuracy}

By using this approach, the Hugging Face Trainer will automatically calculate and track the accuracy of your model’s answers during evaluation on the validation dataset.

References: [1]