Chapter 5 questions

Hi there,

the last “Try it out!” task here asks us

  1. to “create an own dataset of GitHub issues” and
  2. to “fine-tune a multilabel classifier” (for bonus points :wink:).

I have created this dataset. It has 57 different labels and an instance may be labelled with any combination of those. I would like to add the class label names ["bug", "benchmark", "performance", ...] to the dataset. Inspired by this forum post, I have tried the following, yet without success:

features = transformers_issues_labels.features.copy()
features["arr_labels"] = ClassLabel(names=unique_labels)
transformers_issues_labels = transformers_issues_labels.map(
    lambda batch: batch, batched=False, features=features
)

TypeError: Couldn't cast array of type list<item: int64> to int64

=> Two questions:

  1. How to build a classifier for this task (e.g. “MultiLabelFromPretrainedClassifier” or something like this…)?
  2. How can I add the class label names to my dataset (specifically to the “arr_labels” features, assuming this makes sense)?

P.s. In any case: Thanks a ton to all contributors of this course. I am learning a lot and looking forward to part 3.