How to do that trained huggingface model speech recognation?

how to do that trained huggingface model speech recognation on my own dataset my voice my voice my recorder? how i can start ? i don’t know the structure of the dataset? help… very help
how I store voice and how to lik with its text how to orgnize that
I an looking for any one help me in this planet
Should I look for the answer in Mars?!!!