Tell me, so you have implemented a pytorch IterableDataset
that is calling 2 huggingface IterableDataset
s for +ve and -ve class items right?
I think using individual iterable datasets for +ve and -ve classes can be a good idea but the wrapper you are using on top of those datasets doesn’t need to be iterable dataset itself. The parent can be a normal dataset. Have you tried this ?
Or maybe you can try to implement a single iterable dataset in pytorch which loads both +ve and -ve classes
I opened this issue - Big text dataset loading for training. Any insights you can share ?