Vision transformer with Resnet backbone

gmarus · November 17, 2021, 6:10pm

Good time of day,

I am trying to build a VIT model (not using pre-trained checkpoint) with a ResNet backbone (trained). How can one setup VIT model so it would take hidden states of ResNet?

For example timm.models.vision_transformer_hybrid has HybridEmbed, which allows one to use a backbone with VIT. Is there something similar here? or Does one need to go directly to code and change the patch embedding of ViT?

Thanks!

Topic		Replies	Views
Using detr with custom backbone Models	3	625	December 6, 2024
Using Inception V3 as Backbone for Vision Transformer Beginners	0	41	October 13, 2024
What is the best way to fine-tune ViT with a custom dataset? Beginners	2	4105	January 12, 2025
How do i get bare bones of ViT transformers Beginners	4	310	February 24, 2022
Can you provide an example of best practices for incorporating a pretrained HuggingFace Vision Transformer (ViT) into a PyTorch Lightning module? Models	0	72	September 9, 2024

Vision transformer with Resnet backbone

Related topics