TypeError: Couldn't cast array of type int64 while mapping the dataset

I am trying to finetune LayoutLMV2 model for document image classification but in preprocessing stage I am getting error. Can some one please help me in fixing the error.

dataset = Dataset.from_pandas(df_copy)
    features: ['Image_File_Path', 'Label'],
    num_rows: 12

this is the code

# we need to define custom features
path = './Image FIle/'

features = Features({
    'image': Array3D(dtype="int64", shape=(3, 224, 224)),
    'input_ids': Sequence(feature=Value(dtype='int64')),
    'attention_mask': Sequence(Value(dtype='int64')),
    'token_type_ids': Sequence(Value(dtype='int64')),
    'bbox': Array2D(dtype="int64", shape=(512, 4)),
    'labels': Sequence(Value(dtype='int64')),

def preprocess_data(examples):
  # take a batch of images
  images = [Image.open(os.path.join(path, file)).convert("RGB") for file in examples['Image_File_Path']]
  encoded_inputs = processor(images, padding="max_length", truncation=True)
  # add labels
  encoded_inputs["labels"] = [label2id[label] for label in examples["Label"]]
  return encoded_inputs

encoded_dataset = dataset.map(preprocess_data, remove_columns=dataset.column_names, features=features, batched=True, batch_size=2)

this is the error

**TypeError**: Couldn't cast array of type int64 to Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None)


It seems like the cast fails on the labels column - preprocess_data returns a single label per example, so the correct feature type for it would be Value("int64").

Thanks for the guidance,
I change the code as you guided but still getting the same error can you please help me.

This is not what I meant. What I meant is that the value of the labels key in the features dictionary should be Value("int64") (not the current Sequence(Value("int64")) ).

Thanks for the reply,
This is what I did in the data frame it is int64 but when I load the data frame to huggingface dataset it is showing int.
Can you please guide me what to do next.

I know this is very old question but still answering it. When I got same error I came here but not resolved error from this thread. After review few notebooks I resolved error.

Try this:

from datasets import Features, Sequence, ClassLabel, Value, Array2D, Array3D

# we need to define custom features
features = Features({
    'pixel_values': Array3D(dtype="float32", shape=(3, 224, 224)),
    'input_ids': Sequence(feature=Value(dtype='int64')),
    'attention_mask': Sequence(Value(dtype='int64')),
    'bbox': Array2D(dtype="int64", shape=(512, 4)),
#     'labels': ClassLabel(num_classes=len(labels), names=labels),
def prepare_examples(examples):
    images = [Image.open(path).convert("RGB").resize(size=(224,224)) for path in examples['image_path']]
    words = examples[text_column_name]
    boxes = examples[boxes_column_name]
    word_labels = [[label2id[label]] for label in examples["label"]]
    encoding = processor(images, words, boxes=boxes,word_labels=word_labels,
                       truncation=True, padding='max_length')

It worked for me. My labels were in string from the start so I used dict label2id to convert string to number and storing into list. As I am using dict I am mentioning ClassLabel in labels.

