Add metrics to object detection example

The reason why (I also had this bug) is because trainer processes the label_ids attribute and thus make it unintelligible. If you look into it, you’ll see that you’ve got a batch of image ids together and boxes also together. Anyways, Huggingface seems to care a bit less about CV :wink:

Here’s the ugly solution which works (I did it).

  1. Update transformers to the newest version
  2. Opent the trainer.py file in the transformers library “wherever_you_have_your_python/lib/python3.11/site-packages/transformers/trainer.py”
  3. go all the way to the line 3253 and in the next line add all_eval_labels = []
  4. Then go to the for loop 11 lines down and WITHIN THAT LOOP add all_eval_labels.extend(inputs["labels"])
  5. Go further to the line 3371, and just before if self.compute_metrics is not None and all_preds is not None and all_labels is not None: add all_labels = all_eval_labels
    It should look line this:
...
all_labels = all_eval_labels
# Metrics!
if self.compute_metrics is not None and all_preds is not None and all_labels is not None:
...

Voilà! The first part is done. Now the compute metrics method.

def compute_metrics(eval_pred: EvalPrediction):
    """Compute detection metrics"""

    _, scores, pred_boxes, last_hidden_state, encoder_last_hidden_state = eval_pred.predictions

    # scores shape: (number of samples, number of detected anchors, num_classes + 1) last class is the no-object class
    # pred_boxes shape: (number of samples, number of detected anchors, 4)
    # https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/detr-resnet50/README.md
    predictions = []
    for score, box in zip(scores, pred_boxes):
        # Extract the bounding boxes, labels, and scores from the model's output
        pred_scores = torch.from_numpy(score[:, :-1])  # Exclude the no-object class
        pred_boxes = torch.from_numpy(box)
        pred_labels = torch.argmax(pred_scores, dim=-1)

        # Get the scores corresponding to the predicted labels
        pred_scores_for_labels = torch.gather(pred_scores, 1, pred_labels.unsqueeze(-1)).squeeze(-1)
        predictions.append(
            {
                "boxes": pred_boxes,
                "scores": pred_scores_for_labels,
                "labels": pred_labels,
            }
        )

    target = [
        {
            "boxes": eval_pred.label_ids[i]["boxes"].detach().cpu(),
            "labels": eval_pred.label_ids[i]["class_labels"].detach().cpu(),
        }
        for i in range(len(eval_pred.label_ids))
    ]
    map = MeanAveragePrecision(box_format="xywh")
    map.update(preds=predictions, target=target)
    results = map.compute()
    results = {k: v.tolist() if isinstance(v, torch.Tensor) else v for k, v in results.items()}
    return results

Then in my trainer I just add this method as compute_metrics parameter

trainer = Trainer(
        model=model,
        args=training_args,
        data_collator=lambda batch: collate_fn(batch, is_yolo),
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=compute_metrics,
        tokenizer=image_processor
    )

It will work but it’s not a perfect solution and you should remember to remove it if you decide to tackle other tasks like NLP.
Cheers!

1 Like