How to use local version of super_glue dataset instead of downloading it?

Because of tech limitations, I have no option to download anything from the web during the training procedure, so I need to use a local version of super_glue dataset.

I have cloned repo with its files from super_glue at main, but I can’t understand how to use these downloaded files to load them with .load_dataset. It seems like super_glue.py implies downloading .zip files with datasets, thus ignoring passed data_files=PATH_TO_ZIP.zip.

Am I missing something or there is no ability to force super_glue.py use pre-downloaded .zip files to preprocess them?

hi @kefirski !
It’s not possible to provide a path to locally downloaded files now unfortunately.

Instead, you can run load_dataset("super_glue", "config_name") once when you have internet, it will download all the files, extract them, prepare the dataset and cache it. Next time you run load_dataset("super_glue", "config_name") it will be loaded from cached arrow files, you won’t need internet connection. Note that you need to do it for each config you want to use.