How to perform inference on the pretrained model "autonlp-hindi-asr"?

I am trying to use (abhishek/autonlp-hindi-asr)[abhishek/autonlp-hindi-asr · Hugging Face] in a project.

To perform inference on it on colab and my local machine, what should I do?

Like how to feed in the sound input and get the text output?