Object detection resolution fine-tuning

Is it (1) possible

As long as the processor works properly, there shouldn’t be any major problems. Some models, such as CLIP, seem to have hard-coded resolutions, but otherwise like this should be fine.

from transformers import DetrImageProcessor

image_processor = DetrImageProcessor.from_pretrained(
    "facebook/detr-resnet-50",
    do_resize=True,
    size={"height": 540, "width": 960},   # ← your 16:9 resolution
    default_to_square=False,
    do_pad=True,
    pad_size={"height": 540, "width": 960},
)

and is it a good idea (2)

There does not seem to be much of a negative impact on accuracy. However, since the existing weights are learned as squares, it may be necessary to perform thorough tuning on your own.