Used the following code to access arxiv_sample.jsonl
from 1B-sized RedPajama-Data-1T-Sample
but met a FileNotFound
error. However, when clicking the link, I in fact can download the .jsonl
file manually. Any clue why this happen? How can I enable loading in the code?
dataset = load_dataset("togethercomputer/RedPajama-Data-1T-Sample", data_files="arxiv_sample.jsonl")
FileNotFoundError: Unable to find 'https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample/resolve/main/arxiv_sample.jsonl'