Mask must be a pyarrow.Array of type boolean
File "/home/huangxijie/MedMLLM_attack/hf_dataset.py", line 14, in <module>
dataset.push_to_hub("MedMLLM-attack/3MAD-24K", max_shard_size="300MB")
TypeError: Mask must be a pyarrow.Array of type boolean
import pandas as pd
from datasets import Dataset,Image
# Read the CSV file
df = pd.read_csv("MedMQ-2k/metadata.csv")
# Create a Hugging Face Dataset
dataset = Dataset.from_pandas(df)
dataset = dataset.map(lambda example: {"image": example["file_name"]}, batched=True)
# Convert the file_name column to Image type
dataset = dataset.cast_column("image", Image())
# Upload to Hugging Face Hub (make sure authentication is set up)
dataset.push_to_hub("MedMLLM-attack/3MAD-24K", max_shard_size="300MB")
1 Like