I am trying to add a Huggingface Dataset that combines multiple data sources.
I have dataset files in ./data/
that look like the following:
./data/dataset_A/
./data/datasetA/datasetA_train.json
./data/datasetA/datasetA_test.json
./data/datasetA/datasetA.tar.gz
./data/dataset_B/
./data/dataset_B/datasetB_train.json
./data/dataset_B/datasetB_test.json
./data/dataset_B/datasetB.tar.gz
All the files are stored using git-lfs. If I run git lfs pull --include ./data/*/*.json
and git lfs pull --include ./data/*/*.tar.gz
first, DownloadManager.download(‘data/datasetA/datasetA_train.json’) works.
What if I have not used git-lfs to pull them locally though? Can I use the DownloadManager to load each of these files?