How do I load ViT weights into CLIPVisionModel?

How do I load ViTModel weights into CLIPVisionModel?

I want to do some research on ViT, and I need my changes to be visible in both ViT and CLIPVisionModel.

I have taken a look at their state dicts. Should I manually rename the components of the state dict of a ViTModel and then load that into CLIPVisionModel?