Change label names on inference API

Hi there,
I recently uploaded my first model to the model hub and I’m wondering how I can change the label names that are returned by the inference API. Right now, the API returns “LABEL_0”, “LABEL_1”, etc. with the predictions and I would like it to be something like “Economy”, “Welfare”, etc.
I looked at the files of other hosted models and I saw that others changed the id2label and label2id in the config.json file, so I also did that here, but the inference API still returns “LABEL_0”. Do I need to change this somewhere else too? (Or maybe I just need to wait for a day or so until the model is refreshed on AWS?)

Update: I looked more deeply into the docs here and I didn’t find an explanation for how to change the label names. Maybe this could be added?

Thanks for your advice,
Moritz

5 Likes

Hey, does someone have advice on this? Would really like to change the output of my models on the model hub, but I don’t understand how to make it return something else than “LABEL_0”, “LABEL_1” etc. See above what I’ve tried.

Hey @MoritzLaurer, the way I usually do this is by specifying the label2id and id2label dictionaries in the model’s config class, e.g. for a text classifier you can do this:

from transformers import AutoConfig, AutoModelForSequenceClassification

# define the mappings as dictionaries
label2id = ...
id2label = ...
# define model checkpoint - can be the same model that you already have on the hub
model_ckpt = ...
# define config
config = AutoConfig.from_pretrained(model_ckpt, label2id=label2id, id2label=id2label)
# load model with config
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, config=config)
# export model
model.save_pretrained(target_name_or_path)

Then you can push the files to the hub as usual. HTH!

12 Likes

Thanks for your response @lewtun! Ok great, now I know how to do it at the point of model creation in my script.

Do you happen to know the necessary steps for when the model is already on the hub? For example, I have uploaded a model on the model hub (without taking the steps you described) and now I want to change it in hindsight. I changed the config file by changing label2id and id2label to the strings I want, but that doesn’t seem to be sufficient.
see the updated config file here: config.json · MoritzLaurer/policy-distilbert-7d at main
But the inference API still returns “LABEL_0” etc., see here: MoritzLaurer/policy-distilbert-7d · Hugging Face.

Assuming you uploaded your model using the instructions in the docs, my suggestion would be to copy all the contents in the folder produced by saving your model

model.save_pretrained(target_name_or_path)

to the repository where you original model is stored. Then you can run

cd path/to/original/model
git add . && git commit -m "Update labels"
git push

and that should update both config.json and pytorch_model.bin. If you don’t have the repository of the original model you can just clone it again

git lfs install
git clone https://huggingface.co/MoritzLaurer/policy-distilbert-7d

HTH!

1 Like

@lewtun
This is happening with me too. Even when the config file has label2id and id2label defined it still takes the old labels or the auto-defined labels.
https://huggingface.co/aadishhug/distilbert-tweet-analysis/blob/main/config.json

Important thing to know is that this is not happening on my collab notebook. Please note that I have used the pipeline API for output.

Hi @aadishhug, if I remember correctly, when I did these updates by hand in the config.json on the hub, the changes first did not show up in the inference widget, but then after a few days they did. Maybe the internal updates on the hub just take a while.
Otherwise, the solutions from @lewtun outlined above worked for me. the best is do input the label2id/id2label directly when you train/upload

Thanks @MoritzLaurer for letting me know. Will let it take its own course then