Most object detection models are trained for 640x640 resolution, I want to fine-tune a model to detect my classes of interest, but the camera that I will be using is a 16:9 camera, Is it (1) possible and is it a good idea (2) to fine-tune my model with an input resolution of 16:9? (Such as 540x960)
From what i can tell the Trainer
âs TrainerArguments
do have arguments for controlling image input resolutions, does it train with whatever the input image resolution is (which would likely be changed in the pre-processing step)
Note: For inference you can use the size
argument in processor initialization
Note: I am interested for RT-DETRv2 architecture, but feel free to answer about a similar architecture like DETR if you are familiar with it.
Note: For what I can tell you can change the input image size during training with YOLO
/Ultralytics
But Iâm not sure if the same concept applies to DETR object detectors.
Thanks!