FileNotFoundError after using builder.download_and_prepare() to S3

I used the following code to upload a local dataset to a private S3 bucket. It’s very large and took about 16 hours to run.

s3_session = aiobotocore.session.AioSession(profile='default')
storage_options = {"session": s3_session}
fs = s3fs.S3FileSystem(**storage_options)

data_files = {"train": sorted(glob("path/to/parquets/*.parquet"))}
output_dir = "s3://output/dir/here"
builder = load_dataset_builder("parquet", data_files=data_files)
builder.download_and_prepare(output_dir, storage_options=storage_options, file_format="parquet")

Unfortunately when I attempt to load the dataset from the S3 bucket using load_from_disk() I get the following error.
FileNotFoundError: Directory s3://output/dir/here is neither a Dataset directory nor a DatasetDict directory.
Not sure where I went wrong. Any help would be greatly appreciated.

Hi ! load_from_disk only works when used on a dataset that was saved using save_to_disk

Why not simply upload your files directly using s3fs or aiobotocore ?

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.