How to load a huggingface dataset from local path?

Take a simple example in this website, https://huggingface.co/datasets/Dahoas/rm-static:

if I want to load this dataset online, I just directly use,

from datasets import load_dataset
dataset = load_dataset("Dahoas/rm-static") 

What if I want to load dataset from local path, so I download the files and keep the same folder structure from web Files and versions fristly,

-data
|-test-00000-of-00001-bf4c733542e35fcb.parquet
|-train-00000-of-00001-2a1df75c6bce91ab.parquet
-.gitattributes
-README.md
-dataset_infos.json

Then,put them into my folder, but shows error when loading,

dataset_path ="/data/coco/dataset/Dahoas/rm-static"
tmp_dataset = load_dataset(dataset_path)

It shows,FileNotFoundError: No (supported) data files or dataset script found in /data/coco/dataset/Dahoas/rm-static.

if you want to load dataset from your local path you should follow the below apporach
see the docs which will accept a parameter named path where a py to process your local file path

thanks for your reply, but, for example, there is no .py file in Dahoas/rm-static, and even though i put README.md, gitattributes in “/data/coco/dataset/Dahoas/rm-static”, it sitll shows aboved error

I solved this problem,
data_files = {“train”:“train-00000-of-00001-2a1df75c6bce91ab.parquet”,“test”:“test-00000-of-00001-8c7c51afc6d45980.parquet”}
raw_datasets = load_dataset(“parquet”, data_dir=‘/Your/Path/Dahoas/rm-static/data’, data_files=data_files)

3 Likes

Thank you!!!

Good solution! Bro