Hello, there. I downloaded a dataset in Hub and saved it to a local folder. However I could not reload it. Here is my code
from datasets import load_dataset
raw_datasets = load_dataset("roneneldan/TinyStories")
raw_datasets.save_to_disk("Tiny_Stories")
raw_datasets = load_dataset('text', data_dir = "Tiny_Stories")
Here is my error.
UnicodeDecodeError Traceback (most recent call last)
File c:\Users\Tom W\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\builder.py:1925, in ArrowBasedBuilder._prepare_split_single(self, gen_kwargs, fpath, file_format, max_shard_size, job_id)
1924 _time = time.time()
-> 1925 for _, table in generator:
1926 if max_shard_size is not None and writer._num_bytes > max_shard_size:
File c:\Users\Tom W\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\packaged_modules\text\text.py:89, in Text._generate_tables(self, files)
88 while True:
---> 89 batch = f.read(self.config.chunksize)
90 if not batch:
File c:\Users\Tom W\AppData\Local\Programs\Python\Python310\lib\codecs.py:322, in BufferedIncrementalDecoder.decode(self, input, final)
321 data = self.buffer + input
--> 322 (result, consumed) = self._buffer_decode(data, self.errors, final)
323 # keep undecoded input until the next call
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
The above exception was the direct cause of the following exception:
DatasetGenerationError Traceback (most recent call last)
d:\16ComputerScience\Pattern_Recongnition_and_Machine_Learning\Deeplearning_research_oriented\Final_project\Transfromer_Learn\Tokenizer.ipynb Cell 11 in ()
1 from datasets import load_dataset
----> 3 raw_datasets = load_dataset('text', data_dir = "Tiny_Stories")
...
1957 e = e.__context__
-> 1958 raise DatasetGenerationError("An error occurred while generating the dataset") from e
1960 yield job_id, True, (total_num_examples, total_num_bytes, writer._features, num_shards, shard_lengths)
DatasetGenerationError: An error occurred while generating the dataset
Anyone can help?