The process for tokenizing concatenated dataset is slow st the end of tokenizing

I am tokenizing wikipedia English and bookcorpus dataset, which is concatenated in one dataset for training GPT2. Tokenizing each of dataset is fast(i.e. not concatenated) but after concatenation, the tokenizing process is extremly slow at the end of tokenizing. I am using fast tokenizer option