I know this is very old question but still answering it. When I got same error I came here but not resolved error from this thread. After review few notebooks I resolved error.
Try this:
from datasets import Features, Sequence, ClassLabel, Value, Array2D, Array3D
# we need to define custom features
features = Features({
'pixel_values': Array3D(dtype="float32", shape=(3, 224, 224)),
'input_ids': Sequence(feature=Value(dtype='int64')),
'attention_mask': Sequence(Value(dtype='int64')),
'bbox': Array2D(dtype="int64", shape=(512, 4)),
# 'labels': ClassLabel(num_classes=len(labels), names=labels),
'labels':Sequence(ClassLabel(names=label_list)),
})
def prepare_examples(examples):
images = [Image.open(path).convert("RGB").resize(size=(224,224)) for path in examples['image_path']]
words = examples[text_column_name]
boxes = examples[boxes_column_name]
word_labels = [[label2id[label]] for label in examples["label"]]
encoding = processor(images, words, boxes=boxes,word_labels=word_labels,
truncation=True, padding='max_length')
It worked for me. My labels were in string from the start so I used dict label2id
to convert string to number and storing into list. As I am using dict I am mentioning ClassLabel
in labels.