I’m using IterableDataset for reading large datasets (larger than 100G). I do not know how many rows they have and counting this in itself could take quite a while.
In my opinion, hugging face should have just a notion of end of dataset in iterabledatasets. I believe this exists in torch/tf and this could be used to mark when an epoch is finished without requiring the number of rows.