How do I reduce DistilBERT model size?

hey @burrt there’s a few threads on the forum about making transformer models smaller / faster, e.g.

my standard recommendation is to try quantization followed by ONNX / ONNX Runtime. before doing that though, i’d first try to understand what’s causing the timeout on your endpoint - it might be unrelated to the model and you don’t want to spend a lot of time optimising the wrong thing :slight_smile:

1 Like