This happens with whisper as well. Quantizing does have an effect both quantitively and qualitatively. I’ve trained and tested hundreds of whisper models out in the real world. I think what people need to be careful of are eval metrics. I’ve trained models that had great numbers but struggled with basic Japanese audio translations in practice. Also, the opposite. Horrible wer but understood things that were surprising to me even now. Whispers can be a bit odd though.
Whisper-tiny
trainable params: 37184640 || all params: 37760640 || trainable%: 98.47
trainable params: 20640384 || all params: 29503104 || trainable%: 69.96