Hello, I am studying in Uni and trying to learn how to use huggingface libraries.
I wanted to do a simple test of recording myself and running an Automatic Speech Recognition pre-trained model but it looks like hugging face libraries only support using datasets. Is there any way to do this? Will I have to put my audio recording into a dataset format? How can I accomplish my goal?
Thank you.
HI, I am not in the audio recognition field. But I assume the framework is same as other tasks.
It’s better to use Huggingface Dataset format, but It’s not compulsory.
It’s very flexible to switch to Pytorch or TF format.
For example, in Language field, I first use pandas DataFrame.
Second, Convert to HF Dataset by using Dataset.from_pandas(DF)
After tokenizing, I convert it to pytorch format using DataLoader class.
Also I use native pytorch LLM model which is HF pre-trained model.
Just take some time for reading tutorials and docs, then you’ll find out. (It took more than a month for me since I am a slow leaner. )
GL!