What does the output of PVTModel represent? Is it image patch like ViT or feature map like CNN?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Vision Transformer reconstruct image | 2 | 938 | July 21, 2022 | |
How to plot an attention map for Vision Transformer model | 0 | 1436 | April 12, 2024 | |
Pyramid Vision Transformer: Issue with input image size larger than 224 px | 0 | 1260 | September 15, 2023 | |
Using Inception V3 as Backbone for Vision Transformer | 0 | 17 | October 13, 2024 | |
Apply PEFT on ViT | 0 | 432 | March 10, 2023 |