How to tell if datasets is in streaming mode inside dataset script


I鈥檓 working with ZipStore Zarrs for geospatial datasets, one example is here openclimatefix/mrms 路 Datasets at Hugging Face which has a few hundred GBs of precipitation data stored, or this one of GFS forecasts: 路 openclimatefix/gfs-reforecast at main. To properly open the Zarrs to stream, I append the HF url to the local path and load it that way. The issue is, I am not sure how to tell if the dataset is in streaming mode or not inside the script. If it isn鈥檛 streaming, then I shouldn鈥檛 append the URL, otherwise I should.

You can check if you鈥檙e in streaming mode by using dl_manager.is_streaming, which will return a bool. Does this help?

1 Like

Okay, cool, yeah that would work perfectly!

1 Like