Can't Load ViT Model for Fine Tuning

andrewvk · August 10, 2022, 5:46am

Just trying to load some of the Google ViT models for fine-tuning. My code is as follows:

from transformers import ViTFeatureExtractor

model_name_or_path = 'google/vit-base-patch16-224-in21k'
feature_extractor = ViTFeatureExtractor.from_pretrained(model_name_or_path)

from transformers import ViTForImageClassification, TFViTForImageClassification

labels = ['Background', 'Pedestrian', 'Sign', 'TrafficLight', 'Vehicle'] #ds['train'].features['labels'].names

model = TFViTForImageClassification.from_pretrained(
    model_name_or_path,
    num_labels=len(labels),
    id2label={str(i): c for i, c in enumerate(labels)},
    label2id={c: str(i) for i, c in enumerate(labels)}
)

I can only use the in21k models for some reason. When I change model_name_or_path to any other Google ViT (google/vit-base-patch16-224, google/vit-large-patch16-224, etc), I get the following error at the from_pretrained step

ValueError: cannot reshape array of size 768000 into shape (768,5)

Or a different number based on the model size.

Anyone know how to load these for fine tuning?

nielsr · August 10, 2022, 9:33am

Hi,

The reason this works for ‘google/vit-base-patch16-224-in21k’ but not for checkpoints like ‘google/vit-base-patch16-224’ is because the latter include a fine-tuned head on top (namely, a head with 1000 output neurons, as these were fine-tuned on ImageNet-1k).

However, as you’d like to use this model but change the number of output neurons to 5, you need to add the additional ignore_mismatched_sizes=True argument. This ensures that the fine-tuned with 1000 output neurons is replaced by a randomly initialized head with 5 output neurons.

model = TFViTForImageClassification.from_pretrained(
    model_name_or_path,
    num_labels=len(labels),
    id2label={str(i): c for i, c in enumerate(labels)},
    label2id={c: str(i) for i, c in enumerate(labels)},
    ignore_mismatched_sizes=True, # add this to replace the head
)

andrewvk · August 11, 2022, 1:35am

Gotcha, I assumed there would be a setting to just remove the head.

Thank you!

Topic		Replies	Views
Finetuning : need to modify model to go from 1000 to 2 output classes? Beginners	3	104	August 19, 2024
Loading Vision Transformer Model After Changing Its Classifier Head 🤗Transformers	2	942	December 21, 2023
How to load model without pretrained weight Models	3	15569	November 2, 2023
What is the best way to fine-tune ViT with a custom dataset? Beginners	2	4103	January 12, 2025
Failling fine-tuning OWL-ViT Beginners	4	2694	April 11, 2023

Can't Load ViT Model for Fine Tuning

Related topics