Returns list of tensors instead of tensors with set_format in datasets

batsergelen · November 17, 2021, 3:46am

So I’m having issues with datasets’ set_format function, where I expect to get straight tensor array, instead getting list of tensors. I get normal torch tensors for 1d and 2d lists, but when I pass 3d lists, somehow it returns list of tensors.
Here is an example:

from datasets import Dataset

ex1 = {'a':[[1,1],[1,2]], 'b':[1,1]}
ex2 = {'a':[[[2,1],[2,2]], [[3,1],[3,2]]], 'b':[1,1]}

d1 = Dataset.from_dict(ex1)
d1.set_format('torch', columns=['a','b'])
d2 = Dataset.from_dict(ex2)
d2.set_format('torch', columns=['a','b'])

print(d1[:2])
print(d2[:2])

and the output is:

{'a': tensor([[1, 1],
        [1, 2]]), 'b': tensor([1, 1])}
{'a': [[tensor([2, 1]), tensor([2, 2])], [tensor([3, 1]), tensor([3, 2])]], 'b': tensor([1, 1])}

I was expecting to get straight 3d tensor for d2. Why is it returning list?
Would like to get any clarification on it. Thank you.

binkjakub · March 8, 2022, 5:57pm

I encountered exactly the same problem, if the array has 3 dimensions set_format fails and I obtain list of 2d tensors instead of one 3d tensor. To solve this problem you have to simply cast column to Array3D type with, e.g., Dataset.cast_column function

Topic		Replies	Views
Getting list of tensors instead of tensor array after using set_format 🤗Datasets	1	2141	November 30, 2021
Set_format('torch') returns lists of tensors for multiple-entries sample 🤗Datasets	2	480	November 11, 2022
Set the format of the datasets to return pytorch tensors return list of tensors but why? Beginners	3	3861	July 13, 2021
Dataset map return only list instead torch tensors Beginners	8	5554	March 17, 2025
Dataset set_format 🤗Datasets	11	10157	November 24, 2024

Returns list of tensors instead of tensors with set_format in datasets

Related topics