Setting format of columns for nested dictionary datasets with set_format

Hi ::

is there a way to set the format of nested columns, as the main columns are python dictionaries?

A batch would have the following format:

examples = {
         id": [],
        "source": {
            "something": [],
            "something_more": [],
        },
        "target": {
            "something": [],
            "something_more": [],
        },
    }

I want to set the format of “something” and “something_other”.

Thanks!

Hi ! Currently this is not possible unfortunately.
What I would suggest is either:

  • unnest your columns using dataset.flatten()
  • OR use your own formatting transform with dataset.set_transform. This way you can use your own transform that sets the format of your nested columns to whatever your want. The transform should take as input a python dictionary and should return the formatted data.