How to load common voice dataset locally and fine tune whisper with that

hi, I followed this article whisper fine tune and wanted to use it for Persian lang, but unfortunately, my internet isn’t stable and disconnects a lot so I couldn’t use datasets remotely, wanted to download the dataset and use it locally, how should I do that? I tried with a small customized dataset and after training it in the push to hub step kernel died and I couldn’t continue, tried several times to use remotely but it just did timeout and didn’t finish
what should I do for that? please help me fix this
also in the training step just my CPU usage goes up and my GPU is idle so not sure if it uses GPU or not, I installed tensorflow GPU on wsl2 and also on Windows native, and both detected my GPU with test commands, so how should I verify if it uses my GPU or not?