Hi there - I’m looking at adding the Drive Stats data set to Hugging Face. Drive Stats comprises over 10 years of metrics on hard drives spinning in Backblaze’s data centers; nearly 390 million records covering 340,000 drives, each record containing the date, drive model number, drive serial number, drive capacity, the reported SMART attributes and whether the drive failed on that day. You can read more on the embryonic data set page: backblaze/Drive_Stats · Datasets at Hugging Face
The data is currently hosted in a Backblaze B2 (S3-compatible) bucket, in a number of zip files. Each zip file contain’s a calendar quarter of data in CSV files, one per day.
The 33 zip files add up to about 21 GB; the 3749 unzipped CSV files about 128 GB.
There are a few options I can think of:
- Upload a file (what format?) containing links to the zips.
- Upload the zip files
- Upload the CSV files - ‘flat’ in a single directory, or partitioned into subdirectories by year and month
Is there any ‘best practice’ here?