Accessing dataset is very slow compared to torchvision

Access to MNIST examples is 7 times slower than torchvision:

import torchvision
import datasets

mnist_hf = datasets.load_dataset("mnist", split="train")
mnist_hf_inmem = datasets.load_dataset("mnist", split="train", keep_in_memory=True)
mnist_tv = torchvision.datasets.MNIST("~/home", train=True, download=True)

def f(data):
    for ids in range(60000):

%timeit f(mnist_hf)
%timeit f(mnist_hf_inmem)
%timeit f(mnist_tv)
5.21 s ± 126 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.06 s ± 86.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
770 ms ± 30.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Is this due to the storage format? Can something be done about it? Being this slow, for simple convolutional networks, a training step is dominated by dataset access…

related (same?) topic: Why is simply accessing dataset features so slow?

Hi ! I think this is because the torchvision dataset stores the array of pixel values, while the HF dataset stores encoded images in PNG. Therefore the HF dataset uses significantly less disk space, but there is an extra decoding step to get the image.