How to tell if datasets is in streaming mode inside dataset script

Hi,

I’m working with ZipStore Zarrs for geospatial datasets, one example is here openclimatefix/mrms · Datasets at Hugging Face which has a few hundred GBs of precipitation data stored, or this one of GFS forecasts: gfs-reforecast.py · openclimatefix/gfs-reforecast at main. To properly open the Zarrs to stream, I append the HF url to the local path and load it that way. The issue is, I am not sure how to tell if the dataset is in streaming mode or not inside the script. If it isn’t streaming, then I shouldn’t append the URL, otherwise I should.

You can check if you’re in streaming mode by using dl_manager.is_streaming, which will return a bool. Does this help?

1 Like

Okay, cool, yeah that would work perfectly!

1 Like