How to use Dataset with Pytorch Lightning

nielsr · April 13, 2021, 10:14am

I think you also need to specify which columns you’d like to keep when doing .set_format(type='torch'). If you don’t do this, then the text columns are still part of the dataset, and converting strings to PyTorch tensors causes an error.

So I think you just need to update that line to:

train_dataset.set_format(type='torch', columns=['input_ids', 'token_type_ids', 'attention_mask', 'label'])

Topic		Replies	Views
Issues formatting Dataset to PyTorch Pythia LM Beginners	0	7	July 18, 2024
Dataset set_format 🤗Datasets	11	10476	November 24, 2024
Loading div2k from super-image into Pytorch 🤗Datasets	3	2212	September 15, 2021
Convert dataset to pytorch dataloader 🤗Datasets	3	7192	April 7, 2023
Iterating over Dataset with type='torch' columns 🤗Datasets	1	2709	December 5, 2022

How to use Dataset with Pytorch Lightning

Related topics