Discrepancy between OpenAI CLIP and Huggingface CLIP models

ccortner · April 25, 2023, 8:11pm

I’m fine-tuning the CLIP openai/clip-vit-base-patch32 model and trying to convert my project to use the huggingface library. I swapped out the clip model with the Huggingface version. During training I’m consistently seeing lower loss and AUC metric values although I’m using the same base model, hyper parameters, and data. Micro-averaged AUC drops from about .87 to .79, loss is similarly affected. However with the hf model, loss does decrease and its clear its learning the data, just performance is not as good. So far I haven’t been able to find any cause for the discrepancy. Is this to be expected? Is there something different about the hf version that would require me to modify my inputs? My data is preprocessed, so I’m not using the hugging face tokenizer, though I did attempt that and the results were the same. Any and all help appreciated!

ccortner · April 27, 2023, 4:29pm

I solved my own problem. I was scaling the visual encoder position embeddings and when porting the code that interpolated the position vectors, I didn’t register the position_ids as a buffer. Fixing this brought performance to an identical level with the original CLIP model.

hibakanwal16 · August 19, 2024, 8:48am

I’m encountering a “404 Not Found” error while attempting to integrate a CLIP model into a Flask backend and React frontend application. I’ve ensured that the URL is correct, but the requested resource remains inaccessible.

Topic		Replies	Views
Converting weights to .safetensors with HF format -> CLIP-L is ruined. Why? Beginners	18	1257	September 21, 2024
Is there a Hugging Face (HF) model API for inference that is uniform with HF models and the Open AI interface? Models	1	919	July 14, 2023
Just don't get it: OpenAI API in Open WebUI Beginners	2	48	June 12, 2025
How can i deploy a hugging face model on flask application Models	0	745	December 22, 2023
Different results of two APIs of the hub 🤗Hub	3	17	January 20, 2025

Discrepancy between OpenAI CLIP and Huggingface CLIP models

Related topics