How does generation work with compute_metrics

pvelosipednikov · December 9, 2023, 10:52pm

I finetune a model and my validation metrics are an order of magnitude higher than the metrics on the test set. I know that this is quite possible, but such a difference seems extreme to me. I’ve noticed that generation is very sensitive to different parameters (i.e. repetition_penalty, min_length, max_length).

So I’m trying to understand how exactly prediction occurs on the validation set. The only mention of compute_metrics in the source code for trainer is here:

        # later use `self.model is self.model_wrapped` to check if it's wrapped or not
        self.model_wrapped = model
        self.model = model

        self.compute_metrics = compute_metrics

and then it appears in Evaluate`:

        eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
        output = eval_loop(
            eval_dataloader,
            description="Evaluation",
            # No point gathering the predictions if there are no metrics, otherwise we defer to
            # self.args.prediction_loss_only
            prediction_loss_only=True if self.compute_metrics is None else None,
            ignore_keys=ignore_keys,
            metric_key_prefix=metric_key_prefix,
        )

Having trouble understanding how the prediction actually occurs. I’d like to verify that I’m using exactly the same generation parameters when I predict on the test set in order to investigate the difference in metrics.

Topic		Replies	Views
Trainer predict or evaluate returns zero for metrics 🤗Transformers	0	55	July 11, 2024
Seq2SeqTrainingAguments Beginners	0	255	January 26, 2023
Combine multiple metrics in compute_metrics for validation Beginners	1	886	June 4, 2024
Difference between using or not compute_metric Beginners	2	856	November 27, 2023
Custom model for Trainer 🤗Transformers	1	382	July 8, 2023

How does generation work with compute_metrics

Related topics