Using ResNet50 weights inside `CLIPModel`

Hi.

Documentation for CLIP is really comprehensive and along with a collaborator, we were able to quickly cook something up.

Now, in order to reduce the RAM requirements particularly in App Engine (GCP), we need to use a model that is smaller than openai/clip-vit-base-patch32. Since OpenAI has pre-trained ResNet50 weights for CLIP (reference), I was wondering if it’s possible to load that into CLIPModel. If so, someone could help me figure out how?