Datasets with custom python objects

Hi thanks for the library! I would like to have a huggingface Dataset, and one of its column is custom (non-serializable) Python objects. For example, a minimal code:

class MyClass:
    pass

dataset = datasets.Dataset.from_list([
    dict(a=MyClass(), b='hello'),
])

It gives error:

ArrowInvalid: Could not convert <__main__.MyClass object at 0x7a852830d050> with type MyClass: did not recognize Python value type when inferring an Arrow data type

I guess it is because Dataset forces to convert everything into arrow format. However, is there any ways to make the scenario work? Thanks!

Hi ! Indeed Arrow doesn’t support serializing arbitrary objects and only supports types like integers, floats, strings, lists, dicts, etc.

You could try converting your objects to JSON (string) or to Pickle (bytes), and then rebuild your object from this when needed

Thank you!

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.