I am trying to implement a custom Siamese dataset using Hugging Face Datasets to eventually publish on the hub.
I have a list of positive pairs and I generate negative pairs on the fly during training (the number of possible negative pairs is huge and it would be inefficient to store them all). I have not seen how to do that in the docs.
Am I missing something or should I really just use a regular
torch.utils.data.Dataset subclass and give up on publishing it?
Thanks a lot for you help!