How to create a dataset for "audio-like" files for ASR

Hi everyone!

I’m new to Hugging Face and I had a quick question on how to get started.

I have a bunch of “audio-like” dataset on which I would like to perform ASR on a custom tokenizer (trained on DNA sequences). These are large vectors of numbers similar to audio traces, with corresponding ground truth DNA sequences that the signals correspond to. How should I go about generating the Dataset object for use with the Transformer models? Can I make one from scratch and then feed it into existing ASR architectures to train all the weights?

Any help would be appreciated.