How do I reduce DistilBERT model size?

lewtun · April 6, 2021, 9:13pm

hey @burrt there’s a few threads on the forum about making transformer models smaller / faster, e.g.

my standard recommendation is to try quantization followed by ONNX / ONNX Runtime. before doing that though, i’d first try to understand what’s causing the timeout on your endpoint - it might be unrelated to the model and you don’t want to spend a lot of time optimising the wrong thing

Topic		Replies	Views
How to speed up Blenderbot inference with Sagemaker? 🤗Transformers	0	412	February 7, 2023
Training model file too large and fail to deploy Amazon SageMaker	3	1377	July 3, 2023
Slow inference using most recent docker image Amazon SageMaker	10	3195	March 21, 2022
Deploy distiluse-base-multilingual-cased-v2 on Sagemaker Amazon SageMaker	1	484	January 25, 2024
How to deploy a T5 model to AWS SageMaker for fast inference? Amazon SageMaker	13	5785	February 28, 2022

How do I reduce DistilBERT model size?

Related topics