Improving Whisper for Inference

  1. Good catch, I’ll fix this link. Here it is: Quantization
  2. A BetterTransformer model is not exportable to ONNX at the moment, you can find more information in my reply here: Export a BetterTransformer to ONNX - #2 by regisss
    But you can absolutely combine ORT optimization and quantization, even though I’m not sure if there is a general rule regarding which one should be applied first. I guess you’ll have to try both. Maybe @fxmarty or @IlyasMoutawwakil know more about this?