@danijelpetkovic Do you have a reference for how you create your files without distortion ?
Currently espnet and other libraries (except transformers) don’t support adding parameters unfortunately. Since every library will have a different set of parameters, and maintaining each and everyone of them will very fast become tedious and probably clashes and bugs (not even mentioning the docs for that).
That being said, if we can improve the defaults for some models we should definitely do so, the implementations for this API is defined here: huggingface_hub/automatic_speech_recognition.py at main · huggingface/huggingface_hub · GitHub
As you can see it’s pretty bare bones, any help to improve that is welcome.