What does the output of PVTModel represent? Is it image patch like ViT or feature map like CNN?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Vision Transformer reconstruct image | 2 | 1108 | July 21, 2022 | |
What is the correct way to create a feature extractor for a hugging face (HF) ViT model? | 1 | 1051 | April 6, 2023 | |
ViTImageProcessor output visualization | 8 | 690 | April 18, 2024 | |
Using trasnsformer to get image features | 3 | 3343 | March 20, 2024 | |
Pyramid Vision Transformer: Issue with input image size larger than 224 px | 0 | 1554 | September 15, 2023 |