Loading MPEG-4 audio files to huggingface Dataset

IntraphoneMarlar · April 17, 2024, 9:40am

Hello, I am trying to fine tune a whisper model using my own dataset.

I have audio files in MPEG-4(m4a) and a csv file formatted path_to_audio;sentence

I know Huggingface Audio only supports wav/mp3 and I could use ffmpeg to convert the audio files to that but I want to see if I cant convert in code with pydub without having to store extra files in format I wont need later.

What I have tried so far is reading my paths into list and converting that to raw audio byte array using AudioSegment

Example code:

def m4a_to_raw(filename):
    return AudioSegment.from_file(filename, format="m4a").raw_data

raw = list(map(m4a_to_raw ,list(dict.fromkeys(paths))))

dataset = Dataset.from_dict({"audio": raw}).cast_column("audio", Audio())

What Im getting

soundfile.LibsndfileError: Error opening <_io.BytesIO object at 0x7f2a85cc46d0>: Format not recognised.

I’m quite new to both python and huggingface so I might be trying something really strange here.

Is there a way to get Audio from the raw data?
If so what am I missing?
If not, is my only choice to actually convert and save all files in mp3 || wav format?

Topic		Replies	Views
Audio dataset without uploading the data to the hub 🤗Datasets	6	1957	March 20, 2023
Loading custom audio dataset and fine-tuning model Beginners	6	3238	December 12, 2023
Hugging Face is stuck hashing mp3 file for audio dataset 🤗Datasets	1	729	September 5, 2023
Run on single local file rather than dataset Beginners	1	316	January 30, 2024
Problem with Dataset Preview with audio files 🤗Datasets	7	1227	April 17, 2025

Loading MPEG-4 audio files to huggingface Dataset

Related topics