Loading MPEG-4 audio files to huggingface Dataset

Hello, I am trying to fine tune a whisper model using my own dataset.

I have audio files in MPEG-4(m4a) and a csv file formatted path_to_audio;sentence

I know Huggingface Audio only supports wav/mp3 and I could use ffmpeg to convert the audio files to that but I want to see if I cant convert in code with pydub without having to store extra files in format I wont need later.

What I have tried so far is reading my paths into list and converting that to raw audio byte array using AudioSegment

Example code:

def m4a_to_raw(filename):
    return AudioSegment.from_file(filename, format="m4a").raw_data

raw = list(map(m4a_to_raw ,list(dict.fromkeys(paths))))

dataset = Dataset.from_dict({"audio": raw}).cast_column("audio", Audio())

What Im getting

soundfile.LibsndfileError: Error opening <_io.BytesIO object at 0x7f2a85cc46d0>: Format not recognised.

I’m quite new to both python and huggingface so I might be trying something really strange here.

Is there a way to get Audio from the raw data?
If so what am I missing?
If not, is my only choice to actually convert and save all files in mp3 || wav format?