Your error is coming from caching the dataset. Datasets is caching the dataset on disk to work with it properly. The default cache_dir is
~/.cache/huggingface/datasets
. This directory seems not to be on the mounted EBS volume.
I think it’s not that way, it’s around here.