It seems that on 3.4.1, following the order of repeat
+ interleave_datasets
+ shuffle
there’s this issue:
NotImplementedError: <class 'datasets.iterable_dataset.RepeatExamplesIterable'> doesn't implement num_shards yet
Moving repeat to the end seems to result in the same error, and just doing interleave+shuffle works on their own. Is this a newly introduced issue? I can’t seem to find this anywhere. I’ll include the full trace here:
for b in tqdm.tqdm(loader):
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/tqdm/std.py", line 1181, in __iter__
for obj in iterable:
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 701, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1465, in _next_data
return self._process_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1491, in _process_data
data.reraise()
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/torch/_utils.py", line 715, in reraise
raise exception
NotImplementedError: Caught NotImplementedError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 351, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 33, in fetch
data.append(next(self.dataset_iter))
^^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/datasets/iterable_dataset.py", line 2252, in __iter__
yield from self._iter_pytorch()
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/datasets/iterable_dataset.py", line 2132, in _iter_pytorch
if self._is_main_process() and ex_iterable.num_shards < worker_info.num_workers:
^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/datasets/iterable_dataset.py", line 1911, in num_shards
return self.ex_iterable.num_shards
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/datasets/iterable_dataset.py", line 1562, in num_shards
return self.ex_iterable.num_shards
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/datasets/iterable_dataset.py", line 737, in num_shards
return min(ex_iterable.num_shards for ex_iterable in self.ex_iterables)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/datasets/iterable_dataset.py", line 737, in <genexpr>
return min(ex_iterable.num_shards for ex_iterable in self.ex_iterables)
^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/data/oermannlab/users/xl3942/.conda/envs/simdino/lib/python3.11/site-packages/datasets/iterable_dataset.py", line 183, in num_shards
raise NotImplementedError(f"{type(self)} doesn't implement num_shards yet")
NotImplementedError: <class 'datasets.iterable_dataset.RepeatExamplesIterable'> doesn't implement num_shards yet