Dino2 for classification has wrong number of labels

Ofir-S · September 14, 2023, 3:13pm

I am encountering an issue when using the Dinov2ForImageClassification model from the Hugging Face Transformers library, as outlined in the documentation here. Despite following the provided code example and using the latest Transformers version, the resulting model is performing binary classification instead of the expected ImageNet 1000-way classification. Specifically, the length of the logits returned by the model (logits) is 2, whereas it should be 1000 for ImageNet classification.

Here is my code:

from transformers import AutoImageProcessor, Dinov2ForImageClassification
import torch
from datasets import load_dataset

# Load a sample image dataset (in this case, "huggingface/cats-image")
dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

# Load the image processor and the Dinov2ForImageClassification model
image_processor = AutoImageProcessor.from_pretrained("facebook/dinov2-base")
model = Dinov2ForImageClassification.from_pretrained("facebook/dinov2-base")

# Prepare the input and obtain logits
inputs = image_processor(image, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits

# The expected number of labels for ImageNet classification should be 1000
predicted_label = logits.argmax(-1).item()

However, I encounter the following error:

csharpCopy code

Some weights of Dinov2ForImageClassification were not initialized from the model checkpoint at facebook/dinov2-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Additionally, the shape of logits is torch.Size([1, 2]), indicating that the model has only 2 labels instead of the expected 1000 as specified by model.num_labels.

I’m seeking guidance on how to correctly use Dinov2ForImageClassification for ImageNet 1000-way classification as mentioned in the documentation.

Ofir-S · September 14, 2023, 3:47pm

When I load the model using the following code:

model = Dinov2ForImageClassification.from_pretrained("facebook/dinov2-base", num_labels=1000)

It indeed corrects the label dimensions, but it doesn’t load the pretrained weights. My intention is to utilize the model for classification without any additional training while still benefiting from the pretrained weights.

Ofir-S · October 2, 2023, 7:39am

soled here:

github.com/huggingface/transformers

Fail loading pretrained weights for Dinov2ForImageClassification model

opened 04:16PM - 14 Sep 23 UTC

ofirshifman

### System Info both on: transformers 4.32.0 transformers … 4.34.0.dev0 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below) ### Reproduction I've encountered a bug with the Dinov2ForImageClassification model from Hugging Face Transformers. As per the provided documentation [here](https://huggingface.co/docs/transformers/main/model_doc/dinov2#transformers.Dinov2ForImageClassification), I've followed the code example using the latest Transformers version. However, when running the code, I encounter an error indicating that the model is performing binary classification instead of the expected ImageNet 1000-way classification. Here's my code: ``` from transformers import AutoImageProcessor, Dinov2ForImageClassification import torch from datasets import load_dataset # Load a sample image dataset (in this case, 'huggingface/cats-image') dataset = load_dataset('huggingface/cats-image') image = dataset['test']['image'][0] # Load the image processor and the Dinov2ForImageClassification model image_processor = AutoImageProcessor.from_pretrained('facebook/dinov2-base') model = Dinov2ForImageClassification.from_pretrained('facebook/dinov2-base') # Prepare the input and obtain logits inputs = image_processor(image, return_tensors='pt') with torch.no_grad(): logits = model(**inputs).logits # The expected number of labels for ImageNet classification should be 1000 predicted_label = logits.argmax(-1).item() ``` Regardless of whether I specify num_labels=1000 during model initialization to correct the label dimensions, the following error persists: ``` Some weights of Dinov2ForImageClassification were not initialized from the model checkpoint at facebook/dinov2-base and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. ``` The issue persists, and I'm unable to utilize the pretrained Dinov2ForImageClassification model for ImageNet 1000-way classification as intended. ### Expected behavior loading without warning, having 1000-way long output vector, that is representing the correct classification labels of ImageNet. see more here: https://discuss.huggingface.co/t/dino2-for-classification-has-wrong-number-of-labels/55027

Topic		Replies	Views
Huggingface transformers classification using num_labels 1 vs 2 🤗Transformers	1	1149	August 19, 2022
Multilabel sequence classification with Roberta value error expected input batch size to match target batch size 🤗Transformers	1	4231	March 2, 2021
Finetuning : need to modify model to go from 1000 to 2 output classes? Beginners	3	104	August 19, 2024
Visualbert lower accuracy in validation dataset 🤗Transformers	0	185	November 20, 2023
Errors with label2id/id2label with muticlass classification Beginners	2	14	June 25, 2025

Dino2 for classification has wrong number of labels

Related topics