Improving precision of ViT for image classification

Megapixelicious · December 6, 2024, 6:14pm

I successfully trained a VIT classifier based on google/vit-large-patch16-224 by following the HF tutorial about classifying cancel cells.

I have about 45000 images per class (3 classes). I tested the model on a test set and the results are pretty bad ~60 precision while I am getting around 91% when using a CNN (yolov8). Both models were trained with the same dataset, the only difference is the CNN used 640x640 images while the ViT resized them to 224x224.

For reference, my model classify a specific car parts into 3 classes and each of these classes contain possibly hundreds of variations which is made even worst when considering the parts could be in various state of rusting or damaged.

Is there a way to use bigger pictures (I dont mind the extra training time)? I have a feeling that once resized to 224, the images are too small for the model to learn to differentiate between them.

Or is it that I just dont have enough samples?

Topic		Replies	Views
Help! - Drastic Overfitting and Atrocious Accuracy on ViT Model 🤗Transformers	0	700	July 23, 2022
100% accuracy when using a ViT model? Models	0	530	July 22, 2022
What is the best way to fine-tune ViT with a custom dataset? Beginners	2	4103	January 12, 2025
Is it possible to train ViT with different number of patches in every batch? (Non-square images dataset) Models	3	2993	May 1, 2024
Image Classification for photos of documents Beginners	0	152	October 25, 2022

Improving precision of ViT for image classification

Related topics