I am dealing with audio files and have a code to create HF dataset from pandas
it is quiet fast however, if I do something like hg_data['train']['audio'][0]
to see it if can load the audio (numpy array) my sagemaker notebook crashes (even with 16g memory). or if I do "import
IPython.display as ipd
ipd.Audio(data['train']['audio'][0],
rate=22050)"
P.S
my features is like
features = Features({
'tag': ClassLabel(names=unique_genres, id=None),
'SegID': Value(dtype='int32', id=None),
'value': Value(dtype='int64', id=None),
'audio': Value(dtype='string', id=None)
})
_df_train = Dataset.from_pandas(_df_train,
features=features)
_df_train = _df_train.cast_column("audio", Audio(sampling_rate=16000))