How to create a dataset for "audio-like" files for ASR

billytcl · April 10, 2023, 12:10pm

Hi everyone!

I’m new to Hugging Face and I had a quick question on how to get started.

I have a bunch of “audio-like” dataset on which I would like to perform ASR on a custom tokenizer (trained on DNA sequences). These are large vectors of numbers similar to audio traces, with corresponding ground truth DNA sequences that the signals correspond to. How should I go about generating the Dataset object for use with the Transformer models? Can I make one from scratch and then feed it into existing ASR architectures to train all the weights?

Any help would be appreciated.

Topic		Replies	Views
How does one actually create a new dataset? Beginners	2	3254	October 18, 2024
Loading custom audio dataset and fine-tuning model Beginners	6	3240	December 12, 2023
Help in finetuning ASR models Beginners	3	537	January 13, 2023
Run on single local file rather than dataset Beginners	1	316	January 30, 2024
Audio dataset without uploading the data to the hub 🤗Datasets	6	1957	March 20, 2023

How to create a dataset for "audio-like" files for ASR

Related topics