I am trying to load the data set but this one gives me error
The code which I am using is as below
from datasets import load_dataset
yhavi = load_dataset("yhavinga/ccmatrix", "ur-en")
One thingā¦ >>>>>>> On first the data set was in a downloading process but during this the chrome was crashed , but now I am not able to downlaod the dataset ,
Hi! Our error message is misleading, but the problem is that this pile URL is not reachable. The next release of datasets will raise: FileNotFoundError: Unable to find 'https://the-eye.eu/public/AI/pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst'
Hi @mariosasko, the datasets.builder.DatasetGenerationError: An error occurred while generating the dataset also occurs while creating issues_dataset as per the HF tutorial - Creating your own dataset - Hugging Face NLP Course . The only thing which worked for me was either using streaming or reading jsonl as pd.read_json with lines=True argument.
How can we load issues_dataset using the datasets API?
After a bit of experiment, the fix which worked for me was loading the *.jsonl file as pd.read_json and then converting it into a Dataset using datasets API.
import pandas as pd
df=pd.read_json("datasets-issues.jsonl", lines=True)
df.head()
from datasets import Dataset
issues_dataset = Dataset.from_pandas(df)
issues_dataset
sample = issues_dataset.shuffle(seed=666).select(range(3))
sample[0]