Saving custom dataset does not finish


I create my custom dataset to train Donut and want to save it to disk:

dataset.save_to_disk(“/content/drive/MyDrive/image-dataset/huggingface-dataset”)

This is the structure of the dataset:

DatasetDict({
train: Dataset({
features: [‘image’, ‘id’, ‘ground_truth’],
num_rows: 528
})
test: Dataset({
features: [‘image’, ‘id’, ‘ground_truth’],
num_rows: 133
})
})

Hi! Can you interrupt the process while it hangs and paste the error traceback here?

Hi! Can you interrupt the process while it hangs and paste the error traceback here?

This behavior is due to a bug in save_to_disk where we skip the last progress bar update. However, this bug doesn’t affect the actual saving of a dataset to disk - this can verified by reloading the saved dataset with load_from_disk.

PS: You can get a proper output by updating datasets to the latest release.

1 Like

Thank you. Yes it seems that everything is fine after loading the saved dataset.