I am trying to finetune RT-DETR on two GPUs following this script. The batch size is 8 using 2 GPUs (8 per GPU).
It seems that when reaching the compute_metric
method, I seem to get a mismatch between output.logits
and target_sizes
. The batch dimension of output.logits
is 8 while that of target_sizes
is 16. This is the stacktrace message:
File "/home/jb/.cache/pypoetry/virtualenvs/ml-Mf12zaqr-py3.11/lib/python3.11/site-packages/transformers/models/rt_detr/image_processing_rt_detr.py", line 1062, in post_process_object_detection
raise ValueError(
ValueError: Make sure that you pass in as many target sizes as the batch dimension of the logits
50%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 77/154 [00:39<00:39, 1.95it/s]
I suspect that the target_sizes
tensor is gathering all the images from all devices when maybe it shouldnβt? I would appreciate any help!