Dataset not caching

I have a Gradio app in a Space. Its first action is:

self.dataset = load_dataset(dataset_name)
self.dataset = self.dataset.cast_column("audio", Audio(sampling_rate=16000))

I’ve added persistent storage to my Space, but every time I restart the app with a git push, it spends several minutes reloading the dataset. It doesn’t seem to be caching it at all. What am I missing?


hi @danavery ,

In order to use the persistent storage to cache the dataset, you have to set the huggingface_hub cache folder to /data.

got to:
Set Variable

HF_HOME to /data/.huggingface

Thank you!
I had wrongly assumed it would use the default cache path somehow, but it’s helpful to know that it doesn’t.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.