How can I break the loading script into multiple code files (all in the dataset repo)? (e.g. add a loader in one file and a preprocessing format class from another)
You can do relative imports
e.g. if you have
my_dataset.py containing the loader and
processing.py containing your processing, you can import the processing into the loader:
from .processing import my_process_fn
What about a folder with json files in the dataset repo? is there any way to access it locally?
For data files like JSON you must use the
dl_manager.download() method in the dataset script.
This way when loaded from the cloud it will download the data files, and locally it will simply return the path to the local file (nothing to download)