I tried to implement Fine-Tune ViT for Image Classification with 🤗 Transformers
Training runs fine. When I try to run inference I get NOT the logits but weird tanh-results with a dimensionality in the hundreds.
Inference and training script as well as the environment (dockerfile) can be found here:
Some weights of the model checkpoint at vit-base-beans-demo-v5/checkpoint-60 were not used when initializing ViTModel: [‘classifier.bias’, ‘classifier.weight’] - This IS expected if you are initializing ViTModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing ViTModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of ViTModel were not initialized from the model checkpoint at vit-base-beans-demo-v5/checkpoint-60 and are newly initialized: [‘vit.pooler.dense.bias’, ‘vit.pooler.dense.weight’] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. outputs (tensor([[[-0.0922, 0.0321, 0.0705, …, -0.4506, -0.3042, 0.2119], [ 0.0106, 0.1491, 0.0273, …, -0.3192, -0.2063, -0.0749], …, [-0.0709, 0.0378, 0.0878, …, -0.5177, -0.1632, 0.1065]]], grad_fn=), tensor([[-1.0117e-01, 1.3335e-01, -5.3186e-02, -6.9971e-02, 1.2102e-01, -1.2692e-01, 6.0312e-02, -2.8494e-02, 7.3352e-02, -1.5986e-01, -1.1982e-01, 2.8124e-02, -1.5338e-0… shortened … -8.6240e-03, 9.4536e-02, 7.7640e-02, -1.1717e-02, 3.4637e-02, 4.8355e-03, 1.0956e-01, 6.5691e-02, 1.9251e-01, 1.2720e-01, -1.3891e-01, 1.7495e-02, -5.4980e-02, -1.8399e-01, 1.2765e-01, -9.1845e-02, -1.4221e-01, 4.6340e-02]], grad_fn=)) Traceback (most recent call last): File "[inference.py], line 29, in logits = outputs.logits AttributeError: ‘tuple’ object has no attribute ‘logits’