What is the best way to fine-tune ViT with a custom dataset?

ghosh-r · August 14, 2021, 11:04am

I have checked out the course and I have come across tutorials for fine-tuning pre-trained models for NLP tasks.

But I would really like to use the Vision Transformer model for classifying images that I have. I have about 1.8k images belonging to 3 categories, and I would like to use ViT for classification. I want to fine-tune the model to my dataset and thus leverage transfer learning.

This is a task of single-label classification.

How can I do this? What is the best way to fine-tune the pretrained ViT model for a classification task to a smaller dataset?

Can anyone point me towards any recipes or tutorials or other forms of how-tos?

Thanks.

nielsr · August 15, 2021, 9:21am

Hi there! I made some demos on how to fine-tune ViT on a custom dataset here:

sbenhur211000 · January 12, 2025, 4:55pm

Can you point me to something similar using tensorflow?

Topic		Replies	Views
Any best practices example on integrating a pretrained HuggingFace ViT into a pytorch lightning module? Models	5	4882	September 8, 2024
Can't Load ViT Model for Fine Tuning 🤗Transformers	2	1500	August 11, 2022
How do i get bare bones of ViT transformers Beginners	4	310	February 24, 2022
Can you provide an example of best practices for incorporating a pretrained HuggingFace Vision Transformer (ViT) into a PyTorch Lightning module? Models	0	72	September 9, 2024
Finetuning : need to modify model to go from 1000 to 2 output classes? Beginners	3	99	August 19, 2024

What is the best way to fine-tune ViT with a custom dataset?

Related topics