Using Hugging Face dataset class as pytorch class


I have created a custom dataset class using hugging face, and for some reason I would like to use this class as a pytorch dataset class. (with get_item etc…)

Is it possible ?


This is possible by default. What exactly do you want to do? You can simply use such dataset in a PT dataloader as well, as long as you set the format to torch. For instance:

dataset.set_format(type='torch', columns=['input_ids', 'token_type_ids', 'attention_mask', 'labels'])


I would like to apply data augmentation to a dataset (of images) which is an instance of my hugging face custom dataset class. For it to be easier, I’d like to convert this dataset to a pytorch dataset so that I can then be able to add the attribute “transform=” to it when I instanciate my dataset class.

An example : train_loader =, train=True, transform=transform_train))

Here, CIFAR10 is a custom dataset of pytorch, and hence we can give it the attribute transform, I would like to do the same thing.

Is it possible ?


Not sure. Maybe others can help.