Any workaround for push_to_hub() limits?

Mask must be a pyarrow.Array of type boolean
  File "/home/huangxijie/MedMLLM_attack/hf_dataset.py", line 14, in <module>
    dataset.push_to_hub("MedMLLM-attack/3MAD-24K", max_shard_size="300MB")
TypeError: Mask must be a pyarrow.Array of type boolean
import pandas as pd

from datasets import Dataset,Image

# Read the CSV file

df = pd.read_csv("MedMQ-2k/metadata.csv")

# Create a Hugging Face Dataset

dataset = Dataset.from_pandas(df)

dataset = dataset.map(lambda example: {"image": example["file_name"]}, batched=True)

# Convert the file_name column to Image type

dataset = dataset.cast_column("image", Image())

# Upload to Hugging Face Hub (make sure authentication is set up)

dataset.push_to_hub("MedMLLM-attack/3MAD-24K", max_shard_size="300MB")
1 Like