I’m trying to do the exact same thing mentioned in the documentation:
from datasets import Dataset
audio_dataset_amr = Dataset.from_dict({"audio": ["audio_samples/audio.amr"]})
def decode_audio_with_pydub(batch):
return batch
audio_dataset_amr.set_transform(decode_audio_with_pydub)
audio_dataset_amr.save_to_disk(f"./transformed_dataset")
But it fails with the following error:
Exception has occurred: TypeError
Object of type function is not JSON serializable
The format kwargs must be JSON serializable, but key 'transform' isn't.
TypeError: Object of type function is not JSON serializable
During handling of the above exception, another exception occurred:
File "/home/mehran/tmp/locale_classifier/transform_dataset_test.py", line 59, in <module>
audio_dataset_amr.save_to_disk(f"./transformed_dataset")
TypeError: Object of type function is not JSON serializable
The format kwargs must be JSON serializable, but key 'transform' isn't.
Is this a bug or am I doing something wrong?
BTW, if I comment out the set_transform
, everything works just fine.