Custom Siamese dataset

Hi,

I am trying to implement a custom Siamese dataset using Hugging Face Datasets to eventually publish on the hub.

I have a list of positive pairs and I generate negative pairs on the fly during training (the number of possible negative pairs is huge and it would be inefficient to store them all). I have not seen how to do that in the docs.

Am I missing something or should I really just use a regular torch.utils.data.Dataset subclass and give up on publishing it?

Thanks a lot for you help!