Does a tokenizer keep the mapping between my labels to their encoding?

And if not, where can I keep it? Do I have to map it myself or can it be done automatically somewhere?
I have a multi class use case, and so far I’ve used LabelEncoder/Binarizer class. Is there a better way?


If you ask this for token classification, you can pass already tokenized text to the tokenizer and set the parameter is_split_into_words=True. The tokenized text will have the parameter word_ids that contains the mapping from your tokenization and the additional tokenization done by the tokenizer.
In this script for token classification there is tokenize_and_align_labels function that I think does what you need.

I see, thank you, but actually I was referring more to the labels of prediction. Maybe it’s unrelated to tokenizers?
Say I got 3 labels, the simple mapping from [0,1,2] to [“label1”, “label2”, “label3”] - I assume there should be a place the mapping is saved automatically but maybe I’m wrong?

I’m sorry I didn’t get it! In the same script you can find “id2label” and “label2id” that are manually set and passed to the model’s config. These are dictionaries mapping from labels to label ids and vice versa. Try to check if your model config has this parameter set or do the mapping manually and then add the parameter!