Hello,
I would like to split my dataset into train and test samples. My dataset was initially created with a dict.
So it looks likes this:
from datasets import Dataset
data = {"text": ["This is a sentence"]*100, "extra_data": np.random.randint(0, 10, size=(100, 5)), "labels": np.random.randint(0, 3, size=(100,))}
ds = Dataset.from_dict(data)
However when i try to split it with:
train_test = ds.train_test_split(test_size=0.2)
I have this error message:
pyarrow.lib.ArrowTypeError: Did not pass numpy.dtype object
thanks