Creating label2idx dictionary

Hi all,

I am trying to create a label2idx dictionary, where e.g.

labels2idx = {
    0: #all datasample indexes with label 0
    1: #all datasample indexes with label 1
    2: #all datasample indexes with label 2
    ...
}

I am loading the dataset via the hugginface datasets library (e.g. glue->mrpc using load_dataset). Is there a nice way to do this?

The most naive solution is to first loop over all datasamples by hand and then create the labels2idx by hand. I am wondering if there is a more nice way to do it using the library?

You can directly access this as follows:

from datasets import load_dataset

dataset = load_dataset("glue", "mrpc")

labels = dataset["train"].features["label"].names

id2label = {idx: label for idx, label in enumerate(labels)}
label2id = {label: idx for idx, label in enumerate(labels)}
1 Like