You can use the wget
command followed by the file’s URL, which should have the following format: <HUB_REPO_URL>/resolve/main/<FILE_NAME>
. If you are unsure about the exact URL, you can just go to the “Files and versions” section and right-click the little arrow next to the file size to select the “Copy link address” option.
For instance, this would be a way to download the MRPC corpus that you mention:
wget https://huggingface.co/datasets/glue/resolve/main/dataset_infos.json
wget https://huggingface.co/datasets/glue/resolve/main/glue.py
And then you can enter python and do:
from datasets import load_dataset
mrpc = load_dataset(“./glue.py”, “mrpc”)