RedPajama-1T uses a custom dataset loading script to download files outside of HF which can lead to unexpected failures. Maybe you can ask authors about why they’re hosting the files on HF directly by opening a discussion: togethercomputer/RedPajama-Data-1T · Discussions
There are retry mechanisms in datasets
/ huggingface_hub
already when streaming files from HF