LoadDataSet pyarrow.lib.ArrowCapacityError

I have already set num_stards to 100, but the same error still exists

data_set = load_dataset(self.data_file_path, cache_dir=cache_dir, split=“train”)
data_set = data_set.shard(num_shards=100, index=0)

It seems that this error already exists when executing load_dataset()

1 Like