Identical datasets -- huge performance difference

Hello HF Forum,
I have two datasets that, according to my understanding, are almost identical. However the access time to one of them is extremely poor… The problem is described on GH: Almost identical datasets, huge performance difference · Issue #5669 · huggingface/datasets · GitHub

This might be my misunderstanding of the datasets internals. Aren’t those datasets very similar?
What do you I miss?