The reason why (I also had this bug) is because trainer processes the label_ids attribute and thus make it unintelligible. If you look into it, youâll see that youâve got a batch of image ids together and boxes also together. Anyways, Huggingface seems to care a bit less about CV
Hereâs the ugly solution which works (I did it).
- Update transformers to the newest version
- Opent the trainer.py file in the transformers library âwherever_you_have_your_python/lib/python3.11/site-packages/transformers/trainer.pyâ
- go all the way to the line
3253
and in the next line addall_eval_labels = []
- Then go to the for loop 11 lines down and
WITHIN THAT LOOP
addall_eval_labels.extend(inputs["labels"])
- Go further to the line 3371, and just before
if self.compute_metrics is not None and all_preds is not None and all_labels is not None:
addall_labels = all_eval_labels
It should look line this:
...
all_labels = all_eval_labels
# Metrics!
if self.compute_metrics is not None and all_preds is not None and all_labels is not None:
...
VoilĂ ! The first part is done. Now the compute metrics method.
def compute_metrics(eval_pred: EvalPrediction):
"""Compute detection metrics"""
_, scores, pred_boxes, last_hidden_state, encoder_last_hidden_state = eval_pred.predictions
# scores shape: (number of samples, number of detected anchors, num_classes + 1) last class is the no-object class
# pred_boxes shape: (number of samples, number of detected anchors, 4)
# https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/detr-resnet50/README.md
predictions = []
for score, box in zip(scores, pred_boxes):
# Extract the bounding boxes, labels, and scores from the model's output
pred_scores = torch.from_numpy(score[:, :-1]) # Exclude the no-object class
pred_boxes = torch.from_numpy(box)
pred_labels = torch.argmax(pred_scores, dim=-1)
# Get the scores corresponding to the predicted labels
pred_scores_for_labels = torch.gather(pred_scores, 1, pred_labels.unsqueeze(-1)).squeeze(-1)
predictions.append(
{
"boxes": pred_boxes,
"scores": pred_scores_for_labels,
"labels": pred_labels,
}
)
target = [
{
"boxes": eval_pred.label_ids[i]["boxes"].detach().cpu(),
"labels": eval_pred.label_ids[i]["class_labels"].detach().cpu(),
}
for i in range(len(eval_pred.label_ids))
]
map = MeanAveragePrecision(box_format="xywh")
map.update(preds=predictions, target=target)
results = map.compute()
results = {k: v.tolist() if isinstance(v, torch.Tensor) else v for k, v in results.items()}
return results
Then in my trainer I just add this method as compute_metrics
parameter
trainer = Trainer(
model=model,
args=training_args,
data_collator=lambda batch: collate_fn(batch, is_yolo),
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=compute_metrics,
tokenizer=image_processor
)
It will work but itâs not a perfect solution and you should remember to remove it if you decide to tackle other tasks like NLP.
Cheers!