I am training the CLIPVisionModel
using images of size 704x704
. So, I changed the configuration of the CLIPVisionModel to accept the input size of 704x704. The model configuration changes as expected.
configuration = CLIPVisionConfig()
configuration.image_size = 704
vision_model = CLIPVisionModelWithProjection(configuration)
print(vision_model.config.image_size)
Output:
704
However, now when I call from_pretrained
on this model the configuration again reverts back to the default input shape of 224.
configuration = CLIPVisionConfig()
configuration.image_size = 704
vision_model = CLIPVisionModelWithProjection(configuration).from_pretrained("openai/clip-vit-base-patch32")
print(vision_model.config.image_size)
Output:
224
I am guessing when I am calling from_pretrained
it is also taking the config file from the pretrained model. However, I only to change the input shape while still initializing from the pretrained weights. Given the structure of a transformer, I believe it should be possible to use the same model with a different input shape. Can someone help me figure this out?