I have a dataset of texts that I want to split into shorter texts

I have a bunch of long text in a dataset. I want to write a map function such that I split these long samples into multiple shorter samples. Can this be done with Datasets? I saw some stuff around about returning a list of row dictionaries. I tried this and it did not work. I also tried a single dict with list of what should go in the columns. I get errors out of pyarrow either way. Any suggestions about how I should go about doing this. Thanks

This is possible in the batched map mode, as explained here. Note that map requires all the columns in the returned batch to match in length, so either pass remove_columns=dataset.column_names or transform the rest of the columns to make them equal in size to avoid an error.