How do I reduce DistilBERT model size?

Hi everyone,
I am recently start using huggingface’s transformer library and used BERT model to fit my data, after training on AWS sagemaker exported model is 300+ MB each. Then I tried distilBERT, it reduced to around 200MB, yet still too big to invoke if put into multi model endpoint. Is there anyway to reduce the size of distilBERT even more so I can fit them in the multi model endpoint?

Hi ! what makes it too big to invoke in multi-model endpoint? not enough time to deploy? invocation too slow?

Check Do we really need to distill? Jont Loss is all we need.

This model is 4x smaller than DistillBERT and better than it by GLUE score of 4 points.


It seems that after I put BERT models in a multi-model model and create an endpoint for it, when I tried to invoke the endpoint by targeting specific model, it will timeout. And BERT models are stored in S3 buckets. So I assume invocation is too slow.

hey @burrt there’s a few threads on the forum about making transformer models smaller / faster, e.g.

my standard recommendation is to try quantization followed by ONNX / ONNX Runtime. before doing that though, i’d first try to understand what’s causing the timeout on your endpoint - it might be unrelated to the model and you don’t want to spend a lot of time optimising the wrong thing :slight_smile:

1 Like

I agree with @lewtun i’d first diagnose what’s causing the timeout. You can add logs to your inference functions (model loading and inference) and check the ModelLoadingWaitTime SageMaker endpoint metric in CloudWatch to check what steps take time. If model loading is the issue, according to the SageMaker doc it may speed things up if you use a d type instance, that have local NVMe SSDs instead of using virtual EBS drives for storage. Also, try using bigger instances (in AWS, network bandwidth is generally proportional to instance size).

If using regular SageMaker endpoints is enough (and not SageMaker MME), you can take a look at that demo that deploys the bigger bert-base-cased on SageMaker GitHub - aws-samples/amazon-sagemaker-bert-classify-pytorch: This sample show you how to train BERT on Amazon Sagemaker using Spot instances

1 Like

Thank you for your reply. SageMaker engpoints(not MME) is enough. After I deploy the endpoint with one single BERT model, it responds normally and can predict. After I deploy a multi-model endpoint with one or more BERT models in it, the cloudwatch just says model_fn is not provided yet it’s in the script that is in the model artifacts.