If I understand correctly, the google/vit-base-patch16-224-in21k corresponds to timm’s vit_base_patch16_224.augreg_in21k.
However, I found HuggingFace’s has a Pooler layer that timm’s doesn’t have.
Besides, I checked some specific weights, e.g.,
- Huggingface:
embeddings.patch_embeddings.projection.weight
- timm:
patch_embed.proj.weight
They are not equal.
Other minor things could be eps
of LayerNorm
.
I’m wondering if the correct weights have been converted.