How can you use downloaded dataset in streaming mode offline?

I am trying to train an LLM that requires big datasets. The streaming=True option is really helpful for that like the huggingface tutorials explain. For example, I would like to stream the wikipedia dataset such as:

raw_datasets = load_dataset('wikipedia', '20220301.en',split="train",streaming=True)

When I submit a job to a cluster, the nodes on the backend do not have access to the internet. Therefore I need to run in offline mode as:

import os
os.environ['HF_DATASETS_OFFLINE'] = "1"

When I do these together, streaming and offline mode, I get this error:

No such file or directory: 'data/20220301.en/train-00000-of-00041.parquet'

The first strange thing is it does not seem to be checking in the standard huggingface cache directory for files. Given this dataset has already been downloaded to

~/.cache/huggingface/datasets/wikipedia/20220301.en

and works just fine in offline mode when streaming=False, how do I get this to also work with streaming=True? Which is helpful for giant datasets? Thanks!