What does the output of PVTModel represent? Is it image patch like ViT or feature map like CNN?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Vision Transformer reconstruct image | 2 | 1028 | July 21, 2022 | |
How to plot an attention map for Vision Transformer model | 0 | 1875 | April 12, 2024 | |
Pyramid Vision Transformer: Issue with input image size larger than 224 px | 0 | 1455 | September 15, 2023 | |
Using Inception V3 as Backbone for Vision Transformer | 0 | 33 | October 13, 2024 | |
Apply PEFT on ViT | 0 | 445 | March 10, 2023 |