Processing Large Dataset for Training GPT2 model

I am working with a very big datasource (230M documents) and am trying to train a GPT2 style model using run_clm.py script with Deepspeed. There is a grouping function in the run_clm.py script (transformers/run_clm.py at main 路 huggingface/transformers 路 GitHub) which breaks the data into multiple tokens of max_sequence length.

Since my data is so big, the total time showing for me is around 10 days., which is way too much for pre-processing a data. Is there a way I can fasten up the process?