Errors with label2id/id2label with muticlass classification

thetraintomars · June 21, 2025, 2:57pm

I’m having an issue with some models and trying to use id2label/label2id in the config for the model. I am downloading the models once and reloading them locally so I don’t have to re-download constantly as I debug and learn the libraries.

Here’s my code:

from transformers import AutoTokenizer, AutoConfig
from transformers import AutoModelForSequenceClassification

model_path = "../model/pretrained/"

#model_name = "distilbert/distilbert-base-multilingual-cased"
#model_name = "distilbert-base-uncased"
model_name = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"


print("Downloading Tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(model_name)

id2label = {0: 'action', 1: 'adventure', 2: 'crime', 3: 'family', 4: 'fantasy', 5: 'horror', 6: 'mystery', 7: 'romance', 8: 'scifi', 9: 'thriller'}
label2id = {'action': 0, 'adventure': 1, 'crime': 2, 'family': 3, 'fantasy': 4, 'horror': 5, 'mystery': 6, 'romance': 7, 'scifi': 8, 'thriller': 9}

print("Downloading Model")
config = AutoConfig.from_pretrained(model_name, label2id=label2id, id2label=id2label)

model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config)

print("Saving Model")
model.save_pretrained(model_path)

print("Saving Tokenizer")
tokenizer.save_pretrained(model_path)

print("Testing Load from Disk")

print("Loading Tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(model_path)

print("Loading Config...")
config = AutoConfig.from_pretrained(model_path, local_files_only=True) #, label2id=label2id, id2label=id2label)
print("Loading Model...")
model = AutoModelForSequenceClassification.from_pretrained(
    model_path, config=config, local_files_only=True
)
print("Done loading")

The issue is that with the model “distilbert-base-uncased-finetuned-sst-2-english” if I try to use id2label in my config either when I download or reload from disk, I get the following error:

RuntimeError: Error(s) in loading state_dict for Linear:
        size mismatch for bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([10]).

Can some models only do binary classification?

With the other models commented out in my code, if I use id2label in the config when I download it, it works and I don’t even need to use it when I reload from disk.

If I don’t add id2label when I download, but use it when I load locally, the program crashes with the same error as above.

Is either behavior how models are supposed to work or am I doing something wrong? The examples for multiclass classification left a lot to be desired (lots of typos and broken code) and nearly entirely focus on binary classifications.

John6666 · June 22, 2025, 2:32am

It works if you change it like this, but I’m not sure if it’s correct…

model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config)
#model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config, ignore_mismatched_sizes=True)

https://stackoverflow.com/questions/66148641/changing-config-and-loading-hugging-face-model-fine-tuned-on-a-downstream-task

thetraintomars · June 25, 2025, 2:35pm

Thanks, that solved my problem.

Topic		Replies	Views
Label 2 id not working Beginners	1	181	June 12, 2025
Problems in deployment when I configure my own labels Amazon SageMaker	6	2659	July 12, 2023
Inputs.word_ids() length not matching word label length 🤗Tokenizers	3	530	March 22, 2024
Num_labels creates an error for some models 🤗Transformers	2	822	August 3, 2021
I tired and can't solve this error , ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,attention_mask Models	1	1157	March 29, 2023

Errors with label2id/id2label with muticlass classification

Related topics