Chapter 5 questions

Apologies if this has been addressed elsewhere, but when I try to load the dataset, I got the below erro:

from datasets import load_dataset

# This takes a few minutes to run, so go grab a tea or coffee while you wait :)
data_files = "https://mystic.the-eye.eu/public/AI/pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst"
pubmed_dataset = load_dataset("json", data_files=data_files, split="train")
pubmed_dataset

ConnectionError: HTTPSConnectionPool(host=‘mystic.the-eye.eu’, port=443): Max retries exceeded with url: /public/AI/pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst (Caused by NewConnectionError(‘<urllib3.connection.HTTPSConnection object at 0x7f3bc5c6bd50>: Failed to establish a new connection: [Errno 111] Connection refused’))

I changed the url to

data_files = "https://the-eye.eu/public/AI/pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst"

and now the datasets loads successfully. I thought I might share it here in case anyone else got stuck there.