Just trying to load some of the Google ViT models for fine-tuning. My code is as follows:
from transformers import ViTFeatureExtractor
model_name_or_path = 'google/vit-base-patch16-224-in21k'
feature_extractor = ViTFeatureExtractor.from_pretrained(model_name_or_path)
from transformers import ViTForImageClassification, TFViTForImageClassification
labels = ['Background', 'Pedestrian', 'Sign', 'TrafficLight', 'Vehicle'] #ds['train'].features['labels'].names
model = TFViTForImageClassification.from_pretrained(
model_name_or_path,
num_labels=len(labels),
id2label={str(i): c for i, c in enumerate(labels)},
label2id={c: str(i) for i, c in enumerate(labels)}
)
I can only use the in21k models for some reason. When I change model_name_or_path
to any other Google ViT (google/vit-base-patch16-224, google/vit-large-patch16-224, etc), I get the following error at the from_pretrained step
ValueError: cannot reshape array of size 768000 into shape (768,5)
Or a different number based on the model size.
Anyone know how to load these for fine tuning?