Parsing dataset

Hi I am examining open source datasets, specifically tiiuae/falcon-refinedweb. I have set streaming = True to avoid downloading the whole dataset.
Are there any ways I can speed up the process of iterating through the dataset? I am currently exploring using multiprocessing and numba. I am not sure if numba will work.
Any other suggestion is welcomed =) TIA!