Is it (1) possible
As long as the processor works properly, there shouldnât be any major problems. Some models, such as CLIP, seem to have hard-coded resolutions, but otherwise like this should be fine.
from transformers import DetrImageProcessor
image_processor = DetrImageProcessor.from_pretrained(
"facebook/detr-resnet-50",
do_resize=True,
size={"height": 540, "width": 960}, # â your 16:9 resolution
default_to_square=False,
do_pad=True,
pad_size={"height": 540, "width": 960},
)
and is it a good idea (2)
There does not seem to be much of a negative impact on accuracy. However, since the existing weights are learned as squares, it may be necessary to perform thorough tuning on your own.