How can I break the loading script into multiple code files (all in the dataset repo)? (e.g. add a loader in one file and a preprocessing format class from another)
You can do relative imports
e.g. if you have my_dataset.py
containing the loader and processing.py
containing your processing, you can import the processing into the loader:
from .processing import my_process_fn
1 Like
What about a folder with json files in the dataset repo? is there any way to access it locally?
For data files like JSON you must use the dl_manager.download()
method in the dataset script.
This way when loaded from the cloud it will download the data files, and locally it will simply return the path to the local file (nothing to download)
1 Like