Nlp Datasets: speed-test vs Fastai

Also, out of curiosity, did you try to process the dataset in memory with :hugs:nlp, just to get an idea of the difference of speed ? By default it uses memory-mapping which is really fast and uses almost no memory, but it could be interesting for the users that don’t really care about memory usage.
You can do that by specifying keep_in_memory=True in .sort() and .map().