100% accuracy when using a ViT model?

I have used google’s ‘vit-base-patch16-224-in21k’ pretrained ViT model for my image dataset.

I have sequences of images (some 1315 images in total, which is not a large dataset), which I am trying to classify as human present, no human present, which is binary classification.

Model: “model_6”

Layer (type) Output Shape Param #

input_7 (InputLayer) [(None, 3, 224, 224)] 0

vit (TFViTMainLayer) TFBaseModelOutputWithPool 86389248

global_average_pooling1d (Gl (None, 768) 0

dense_12 (Dense) (None, 256) 196864

dropout_40 (Dropout) (None, 256) 0

outputs (Dense) (None, 1) 257

Total params: 86,586,369
Trainable params: 197,121
Non-trainable params: 86,389,248


After training, I get 100% accuracy on my test, validation and training dataset!

my accuracy and loss curves look like this.

how can it be possible?