Error Handling in IterableDataset?

RedPajama-1T uses a custom dataset loading script to download files outside of HF which can lead to unexpected failures. Maybe you can ask authors about why they’re hosting the files on HF directly by opening a discussion: togethercomputer/RedPajama-Data-1T · Discussions

There are retry mechanisms in datasets / huggingface_hub already when streaming files from HF

1 Like