Those options do seem to depend on the model looking at the comments.
Do you think you could maybe create a PR here: huggingface_hub/automatic_speech_recognition.py at main · huggingface/huggingface_hub · GitHub ?
We could ping some espnet maintainers to take a look.