Hello, I am trying to fine tune a whisper model using my own dataset.
I have audio files in MPEG-4(m4a) and a csv file formatted path_to_audio;sentence
I know Huggingface Audio only supports wav/mp3 and I could use ffmpeg to convert the audio files to that but I want to see if I cant convert in code with pydub without having to store extra files in format I wont need later.
What I have tried so far is reading my paths into list and converting that to raw audio byte array using AudioSegment
Example code:
def m4a_to_raw(filename):
return AudioSegment.from_file(filename, format="m4a").raw_data
raw = list(map(m4a_to_raw ,list(dict.fromkeys(paths))))
dataset = Dataset.from_dict({"audio": raw}).cast_column("audio", Audio())
What Im getting
soundfile.LibsndfileError: Error opening <_io.BytesIO object at 0x7f2a85cc46d0>: Format not recognised.
I’m quite new to both python and huggingface so I might be trying something really strange here.
Is there a way to get Audio
from the raw data?
If so what am I missing?
If not, is my only choice to actually convert and save all files in mp3 || wav format?