Potential bug in the rt-detr v2 fine tune script

zzzz567 · February 8, 2025, 11:22pm

Hi,

I have been running this script in google colab.

First of all, thank you very much for this, super clear!

I have noticed a potential bug when running the trainer.evaluate command.

The original is working fine however as soon I try to run it for a single record or with a batch size creating a single record batch (let’s say 9 since eval batch size default to 8) then it fails.

ValueError                                Traceback (most recent call last)
in <cell line: 0>()
----> 1 metrics2 = trainer.evaluate(eval_dataset=t_dataset, metric_key_prefix="eval")

4 frames
in collect_targets(self, targets, image_sizes)
     33                 # here we have "yolo" format (x_center, y_center, width, height) in relative coordinates 0..1
     34                 # and we need to convert it to "pascal" format (x_min, y_min, x_max, y_max) in absolute coordinates
---> 35                 height, width = size
     36                 boxes = torch.tensor(target["boxes"])
     37                 boxes = center_to_corners_format(boxes)

ValueError: not enough values to unpack (expected 2, got 1)

If you want to recreate the error:

 t_dataset = CPPE5Dataset(dataset["test"].select([1]), image_processor, transform=validation_transform)

metrics = trainer.evaluate(eval_dataset=t_dataset, metric_key_prefix="eval")

or

t_dataset = CPPE5Dataset(dataset["test"].select(list(range(2))), image_processor, transform=validation_transform)

metrics = trainer.evaluate(eval_dataset=t_dataset, metric_key_prefix="eval")

After diving a bit more in the error, realised that the model prediction seem to select only the first index for all the values in the label dictionary when the batch size is one. That is why the image_size only has tensor([480]) instead of tensor([480,480]).

Hope you guys can help! Thank you

nicholasgcoles · February 20, 2025, 9:12am

We ran into a similar issue. For now the “fix” is to ensure that a batch can never have 1 element in it. In our case the batch size is 8, so we make sure that the validation_df never has a remainder of 1 when dividing by 8.

e.g.

    validation_size = len(validation_df)
    remainder = validation_size % 8
    if remainder == 1:
        sample_to_move = train_df.iloc[0:1]
        train_df = train_df.iloc[1:]
        validation_df = pd.concat([validation_df, sample_to_move]).reset_index(drop=True)

xyziamafish · February 27, 2025, 8:32am

@nicholasgcoles
Since you’ve run into this issue at the evaluation stage, you’ve managed to train successfully it seems. I’m running onto a problem here:
from transformers import AutoModelForObjectDetection

model = AutoModelForObjectDetection.from_pretrained(
    checkpoint,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
)

RuntimeError: Error(s) in loading state_dict for RTDetrV2ForObjectDetection:
	size mismatch for model.denoising_class_embed.weight: copying a param with shape torch.Size([81, 256]) from checkpoint, the shape in current model is torch.Size([6, 256]).
	size mismatch for model.enc_score_head.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.enc_score_head.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
            ...

This seems to relate to the classes (coco’s 80 vs the cppe-5’s 5 classes). What’s odd is the ignore_mismatched_sizes=True doesn’t ignore this

Did either of you run into this?
cc @qubvel-hf as you authored this tutorial

Any help would be greatly appreciated

FYI I’m running the notebook on colab, setup below (though also ran into this on my local Ubuntu 20.04, x86 machine)
numpy version: 1.26.4
transformers version: 4.50.0.dev0
torch version: 2.5.1+cu124
Python 3.11.11
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU @ 2.20GHz

John6666 · February 27, 2025, 10:10am

Seems related:

github.com/huggingface/transformers

CUDA Out Of Memory when training a DETR Object detection model with compute_metrics

opened 10:56PM - 09 Nov 24 UTC

closed 08:04AM - 05 Jan 25 UTC

Kamal-Moha

trainer bug Vision

### System Info `transformers` version 4.47.0.dev0 `accelerate` version 0.34….2 `timm` version 1.0.11 `supervision` version 0.25.0rc2 ### Who can help? @muellerzr @Arth ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below) ### Reproduction I’m training a DETR Object Detection model using the Trainer API. I have properly created the coco dataset. But when I run the Trainer API with `custom_metrics`, I get the error “OutOfMemoryError: CUDA out of memory.”. I have reduced batch_size from 16 until 1, but the same error of “Out of memory”. ``` def collate_fn(batch): data = {} data["pixel_values"] = torch.stack([x["pixel_values"] for x in batch]) data["labels"] = [x["labels"] for x in batch] return data ``` Here’s how I’m creating the `custom_metrics` function ``` id2label = {id: label for id, label in enumerate(train_ds.classes)} label2id = {label: id for id, label in enumerate(train_ds.classes)} @dataclass class ModelOutput: logits: torch.Tensor pred_boxes: torch.Tensor class MAPEvaluator: def __init__(self, image_processor, threshold=0.00, id2label=None): self.image_processor = image_processor self.threshold = threshold self.id2label = id2label def collect_image_sizes(self, targets): """Collect image sizes across the dataset as list of tensors with shape [batch_size, 2].""" image_sizes = [] for batch in targets: batch_image_sizes = torch.tensor(np.array([x["size"] for x in batch])) image_sizes.append(batch_image_sizes) return image_sizes def collect_targets(self, targets, image_sizes): post_processed_targets = [] for target_batch, image_size_batch in zip(targets, image_sizes): for target, (height, width) in zip(target_batch, image_size_batch): boxes = target["boxes"] boxes = sv.xcycwh_to_xyxy(boxes) boxes = boxes * np.array([width, height, width, height]) boxes = torch.tensor(boxes) labels = torch.tensor(target["class_labels"]) post_processed_targets.append({"boxes": boxes, "labels": labels}) return post_processed_targets def collect_predictions(self, predictions, image_sizes): post_processed_predictions = [] for batch, target_sizes in zip(predictions, image_sizes): batch_logits, batch_boxes = batch[1], batch[2] output = ModelOutput(logits=torch.tensor(batch_logits), pred_boxes=torch.tensor(batch_boxes)) post_processed_output = self.image_processor.post_process_object_detection( output, threshold=self.threshold, target_sizes=target_sizes ) post_processed_predictions.extend(post_processed_output) return post_processed_predictions @torch.no_grad() def __call__(self, evaluation_results): predictions, targets = evaluation_results.predictions, evaluation_results.label_ids image_sizes = self.collect_image_sizes(targets) post_processed_targets = self.collect_targets(targets, image_sizes) post_processed_predictions = self.collect_predictions(predictions, image_sizes) evaluator = MeanAveragePrecision(box_format="xyxy", class_metrics=True) evaluator.warn_on_many_detections = False evaluator.update(post_processed_predictions, post_processed_targets) metrics = evaluator.compute() # Replace list of per class metrics with separate metric for each class classes = metrics.pop("classes") map_per_class = metrics.pop("map_per_class") mar_100_per_class = metrics.pop("mar_100_per_class") for class_id, class_map, class_mar in zip(classes, map_per_class, mar_100_per_class): class_name = id2label[class_id.item()] if id2label is not None else class_id.item() metrics[f"map_{class_name}"] = class_map metrics[f"mar_100_{class_name}"] = class_mar metrics = {k: round(v.item(), 4) for k, v in metrics.items()} return metrics eval_compute_metrics_fn = MAPEvaluator(image_processor=processor, threshold=0.01, id2label=id2label) ``` This is how I'm training the model ``` training_args = TrainingArguments( output_dir=f"Malaria-finetune", report_to="none", num_train_epochs=10, max_grad_norm=0.1, learning_rate=5e-5, warmup_steps=300, per_device_train_batch_size=1, dataloader_num_workers=2, metric_for_best_model="eval_map", greater_is_better=True, load_best_model_at_end=True, eval_strategy="epoch", save_strategy="epoch", save_total_limit=2, remove_unused_columns=False, eval_do_concat_batches=False, ) trainer = Trainer( model=model, args=training_args, train_dataset=pytorch_dataset_train, eval_dataset=pytorch_dataset_valid, processing_class=processor, data_collator=collate_fn, compute_metrics=eval_compute_metrics_fn ) trainer.train() ``` I would like to please get help on this. ### Expected behavior I expected the Trainer to train normally on this custom dataset for 10 epochs without any errors

Topic		Replies	Views
CUDA Out Of Memory when training a DETR Object detection model with compute_metrics 🤗Transformers	3	108	July 17, 2025
Trainer API object detection 🤗Transformers	2	46	December 29, 2024
Trainer .evaluate() method returns one less prediction, but training runs fine (GPT-2 fine-tuning) Beginners	2	1796	November 14, 2022
Expected `tensors` and `new_tensors` to have the same type but found <class 'tuple'> and <class 'torch.Tensor'> 🤗Transformers	2	14	January 12, 2025
Add metrics to object detection example 🤗Transformers	12	3895	May 8, 2024

Potential bug in the rt-detr v2 fine tune script

Related topics