Are each of these models atomic building blocks or can I dissect them at all? For example, I would like to use the Swin image transformer microsoft/swin-base-patch4-window7-224 · Hugging Face but leave out the classification head to only output the embedding. However, it appears the model goes straight from inputs to logits. Is breaking up a model to get the embedding possible?
Hi! You can load this model as follows:
from transformers import SwinModel SwinModel.from_pretrained("microsoft/swin-base-patch4-window7-224")
if you are are only interested in the “raw” model and not the classification head.