Flatten List of features


I’m trying to flatten a list of features, however, I don’t know how I can do that :sweat:
So I have a dataset like this:

    {"data": [
         {"a": 1},
         {"a": 2}  
    {"data": [
         {"a": 3},
         {"a": 4}  

and what to transform into a dataset like this:

    {"data.a": 1},
    {"data.a": 2},
    {"data.a": 3},
    {"data.a": 4},

How can I achieve this?

Hi! You can use Dataset.map for that:

def flatten_list_of_dict(batch):
    return {"data.a": [dic["a"] for ex_list_of_dict in batch["data"] for dic in ex_list_of_dict]}

ds = ds.map(flatten_list_of_dict, batched=True, remove_columns=["data"])
1 Like