for i in range(len(ds['train'])):
img=ds['train'][i]['image']
encoding = feature_extractor(img, return_tensors='pt').to(device)
with torch.no_grad():
outputs = model(**encoding,output_hidden_states=True)
I tried it and I got a tuple comprising 13 tensors. Each tensor shape is 1x197x768. My question is: where can I find the class token feature vector?