I came to know Hugging Face use optimized onnx models for inference on cpu. I tried to do something like that using keras VGGNet16 pretrained model using keras-onnx package (see this github issue) but couldn’t see any performance benefits. Can I know how exactly Hugging Face is optimizing models under the hood?
These two blog posts might help
1 Like