Add_column() does not work if used on dataset sliced with select()

Hello all, say I have a dataset with 2000 entries

dataset = Dataset.from_dict({‘colA’: list(range(2000))})

and from which I want to extract the first one thousand rows, create a new dataset with these and also add a new column to it:

dataset2 =
final_dataset = dataset2.add_column(‘colB’, list(range(1000)))

This gives an error

ArrowInvalid: Added column’s length must match table’s length. Expected length 2000 but got length 1000

I’ve experimented with the arguments of the select method, but I did not find a way to surpass this error. Does anyone know why it’s happening and how to resolve it?


Hi! Could you please open an issue in our GH repo because this looks like a bug in datasets?

In the meantime, call flatten_indices (dset.flatten_indices()) after select and before add_column.


Will do. What you suggested works as well, thanks.