Finetuning : need to modify model to go from 1000 to 2 output classes?

Dear hugging face community,

I am finetuning models (classic “google/vit-base-patch16-224”) to classify specific objects. I built my own dataset containing images from 2 differents categories which I want to classify. However I got this warning during training :

Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224 and are newly initialized because the shapes did not match:

  • classifier.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([2]) in the model instantiated
  • classifier.weight: found shape torch.Size([1000, 768]) in the checkpoint and torch.Size([2, 768]) in the model instantiated
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

From what I understand the pretrained model is classifying image in 1000 categories and I am only fine-tuning it using a database with 2 categories. Is this a problem ?

Do I need to modify the model to classify only using 2 categories ? I am not sure to understand what is going on here…

Thank you for your help !

hi @Soubiy
Are you following Google Colab from Transfer Learning and Fine-tuning Vision Transformers for Image Classification - Hugging Face Community Computer Vision Course ?

Otherwise can you please share your code snippet?

Thanks @mahmutc for the course ! I will read this carefully to understand how it works.

1 Like

Hi,

Yes this warning is fine. It tells you that you are loading all the pre-trained weights, however the classifier on top has its weights and bias randomly initialized. Hence it needs to be fine-tuned on a custom dataset.