I would like to grab the first and second chunk of SlimPajama-627B, which is 20% of the tokens.
Is there a way to tweak some configuration file to do this? I can only see how to load a faction of the dataset one the whole dataset has been downloaded. Is there a way to make load_dataset and/or some configuration script to do this? Thank you.