Potential bug in the rt-detr v2 fine tune script

Hi,

I have been running this script in google colab.

First of all, thank you very much for this, super clear!

I have noticed a potential bug when running the trainer.evaluate command.

The original is working fine however as soon I try to run it for a single record or with a batch size creating a single record batch (let’s say 9 since eval batch size default to 8) then it fails.

ValueError                                Traceback (most recent call last)
in <cell line: 0>()
----> 1 metrics2 = trainer.evaluate(eval_dataset=t_dataset, metric_key_prefix="eval")

4 frames
in collect_targets(self, targets, image_sizes)
     33                 # here we have "yolo" format (x_center, y_center, width, height) in relative coordinates 0..1
     34                 # and we need to convert it to "pascal" format (x_min, y_min, x_max, y_max) in absolute coordinates
---> 35                 height, width = size
     36                 boxes = torch.tensor(target["boxes"])
     37                 boxes = center_to_corners_format(boxes)

ValueError: not enough values to unpack (expected 2, got 1)

If you want to recreate the error:

 t_dataset = CPPE5Dataset(dataset["test"].select([1]), image_processor, transform=validation_transform)

metrics = trainer.evaluate(eval_dataset=t_dataset, metric_key_prefix="eval")

or

t_dataset = CPPE5Dataset(dataset["test"].select(list(range(2))), image_processor, transform=validation_transform)

metrics = trainer.evaluate(eval_dataset=t_dataset, metric_key_prefix="eval")

After diving a bit more in the error, realised that the model prediction seem to select only the first index for all the values in the label dictionary when the batch size is one. That is why the image_size only has tensor([480]) instead of tensor([480,480]).

Hope you guys can help! Thank you

2 Likes

We ran into a similar issue. For now the “fix” is to ensure that a batch can never have 1 element in it. In our case the batch size is 8, so we make sure that the validation_df never has a remainder of 1 when dividing by 8.

e.g.

    validation_size = len(validation_df)
    remainder = validation_size % 8
    if remainder == 1:
        sample_to_move = train_df.iloc[0:1]
        train_df = train_df.iloc[1:]
        validation_df = pd.concat([validation_df, sample_to_move]).reset_index(drop=True)
2 Likes

@nicholasgcoles
Since you’ve run into this issue at the evaluation stage, you’ve managed to train successfully it seems. I’m running onto a problem here:
from transformers import AutoModelForObjectDetection

model = AutoModelForObjectDetection.from_pretrained(
    checkpoint,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
)

RuntimeError: Error(s) in loading state_dict for RTDetrV2ForObjectDetection:
	size mismatch for model.denoising_class_embed.weight: copying a param with shape torch.Size([81, 256]) from checkpoint, the shape in current model is torch.Size([6, 256]).
	size mismatch for model.enc_score_head.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.enc_score_head.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
            ...

This seems to relate to the classes (coco’s 80 vs the cppe-5’s 5 classes). What’s odd is the ignore_mismatched_sizes=True doesn’t ignore this

Did either of you run into this?
cc @qubvel-hf as you authored this tutorial

Any help would be greatly appreciated

FYI I’m running the notebook on colab, setup below (though also ran into this on my local Ubuntu 20.04, x86 machine)
numpy version: 1.26.4
transformers version: 4.50.0.dev0
torch version: 2.5.1+cu124
Python 3.11.11
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU @ 2.20GHz

1 Like

Seems related: