Chapter 5 questions

mdroth · May 18, 2022, 11:53pm

Hi there,

the last “Try it out!” task here asks us

to “create an own dataset of GitHub issues” and
to “fine-tune a multilabel classifier” (for bonus points ).

I have created this dataset. It has 57 different labels and an instance may be labelled with any combination of those. I would like to add the class label names ["bug", "benchmark", "performance", ...] to the dataset. Inspired by this forum post, I have tried the following, yet without success:

features = transformers_issues_labels.features.copy()
features["arr_labels"] = ClassLabel(names=unique_labels)
transformers_issues_labels = transformers_issues_labels.map(
    lambda batch: batch, batched=False, features=features
)

TypeError: Couldn't cast array of type list<item: int64> to int64

=> Two questions:

How to build a classifier for this task (e.g. “MultiLabelFromPretrainedClassifier” or something like this…)?
How can I add the class label names to my dataset (specifically to the “arr_labels” features, assuming this makes sense)?

P.s. In any case: Thanks a ton to all contributors of this course. I am learning a lot and looking forward to part 3.

Topic		Replies	Views
The 🤗 Datasets library - Hugging Face Course 🤗Datasets	1	567	November 25, 2021
Got wrong row number of dataset viewer 🤗Hub	11	602	June 26, 2024
Undesired behavior when using load_dataset 🤗Datasets	4	945	April 17, 2023
Mapping 1 multi-element column of a dataset to multi row dataset with 1 element per row, duplicating other features 🤗Datasets	6	2529	November 4, 2022
How to operate on columns of a dataset Beginners	2	149	January 30, 2025

Chapter 5 questions

Related topics