Run on single local file rather than dataset

Hakase-Noonna · January 30, 2024, 6:42am

HI, I am not in the audio recognition field. But I assume the framework is same as other tasks.

It’s better to use Huggingface Dataset format, but It’s not compulsory.
It’s very flexible to switch to Pytorch or TF format.

For example, in Language field, I first use pandas DataFrame.
Second, Convert to HF Dataset by using Dataset.from_pandas(DF)
After tokenizing, I convert it to pytorch format using DataLoader class.

Also I use native pytorch LLM model which is HF pre-trained model.

Just take some time for reading tutorials and docs, then you’ll find out. (It took more than a month for me since I am a slow leaner. )
GL!

Topic		Replies	Views
How to do that trained huggingface model speech recognation? DeepSpeed	0	402	December 10, 2021
How to use load_dataset to load my own local dataset? 🤗Datasets	1	920	May 24, 2023
Audio dataset without uploading the data to the hub 🤗Datasets	6	1970	March 20, 2023
Convert from HF audio dataset to raw audio file 🤗Datasets	1	856	November 22, 2023
Converting finetuned Pytorch Whisper model to HF Beginners	0	694	January 15, 2023

Run on single local file rather than dataset

Related topics