I know timm uses position encoding interpolation to handle any size, but is it possible to do this in transformers ? I can add this interpolation manually but I am kind of a beginner so dont really know how to. I need CLIP ViTs to support non-square images…
1 Like