Save_to_disk loses formatting information

Hi. I set some columns to have type np, then save_to_disk, then load back, and the formatting is lost - the column appears as a list of floats. What am I doing wrong?
Here is code to reproduce the issue

ds = load_dataset("Bingsu/Cat_and_Dog", split="train")
ds = ds.select(range(0, 5))
model = SentenceTransformer('clip-ViT-B-32')
processor = lambda examples: {'embedding': model.encode(examples['image'])}
ds = ds.map(transform, batched=True, batch_size=32)
ds.set_format("np", columns=["embedding"], output_all_columns=True)
print(ds)
print(ds.format)
x = ds[0]['embedding']
print(type(x))
print(x.shape)
file_name = "cats-dogs-resnet-embeddings.hf"
ds.save_to_disk(file_name)


ds2 = datasets.load_from_disk(file_name)
print(ds2)
print(ds2.format)
x = ds2[0]['embedding']
print(type(x))
print(type(x[0]))

produces

Dataset({
    features: ['image', 'labels', 'embedding'],
    num_rows: 5
})
{'type': 'numpy', 'format_kwargs': {}, 'columns': ['embedding'], 'output_all_columns': True}
<class 'numpy.ndarray'>
(512,)


Dataset({
    features: ['image', 'labels', 'embedding'],
    num_rows: 5
})
{'type': None, 'format_kwargs': {}, 'columns': ['image', 'labels', 'embedding'], 'output_all_columns': False}
<class 'list'>
<class 'float'>

Hi! This looks like a bug - we save the format state in save_to_disk, but we don’t restore it in load_from_disk.

EDIT:

I’ve reported this bug here. Feel free to self-assign it if you are interested in fixing it yourself (and I can give you some pointers if needed).