How do I reduce DistilBERT model size?

burrt · March 25, 2021, 10:36pm

Hi everyone,
I am recently start using huggingface’s transformer library and used BERT model to fit my data, after training on AWS sagemaker exported model is 300+ MB each. Then I tried distilBERT, it reduced to around 200MB, yet still too big to invoke if put into multi model endpoint. Is there anyway to reduce the size of distilBERT even more so I can fit them in the multi model endpoint?

OlivierCR · April 1, 2021, 5:46pm

Hi ! what makes it too big to invoke in multi-model endpoint? not enough time to deploy? invocation too slow?

jominmathew · April 2, 2021, 8:01am

Check Do we really need to distill? Jont Loss is all we need.

This model is 4x smaller than DistillBERT and better than it by GLUE score of 4 points.

burrt · April 6, 2021, 12:45am

It seems that after I put BERT models in a multi-model model and create an endpoint for it, when I tried to invoke the endpoint by targeting specific model, it will timeout. And BERT models are stored in S3 buckets. So I assume invocation is too slow.

lewtun · April 6, 2021, 9:13pm

hey @burrt there’s a few threads on the forum about making transformer models smaller / faster, e.g.

my standard recommendation is to try quantization followed by ONNX / ONNX Runtime. before doing that though, i’d first try to understand what’s causing the timeout on your endpoint - it might be unrelated to the model and you don’t want to spend a lot of time optimising the wrong thing

OlivierCR · April 7, 2021, 6:44pm

I agree with @lewtun i’d first diagnose what’s causing the timeout. You can add logs to your inference functions (model loading and inference) and check the ModelLoadingWaitTime SageMaker endpoint metric in CloudWatch to check what steps take time. If model loading is the issue, according to the SageMaker doc it may speed things up if you use a d type instance, that have local NVMe SSDs instead of using virtual EBS drives for storage. Also, try using bigger instances (in AWS, network bandwidth is generally proportional to instance size).

If using regular SageMaker endpoints is enough (and not SageMaker MME), you can take a look at that demo that deploys the bigger bert-base-cased on SageMaker GitHub - aws-samples/amazon-sagemaker-bert-classify-pytorch: This sample show you how to train BERT on Amazon Sagemaker using Spot instances

burrt · April 12, 2021, 9:11pm

Thank you for your reply. SageMaker engpoints(not MME) is enough. After I deploy the endpoint with one single BERT model, it responds normally and can predict. After I deploy a multi-model endpoint with one or more BERT models in it, the cloudwatch just says model_fn is not provided yet it’s in the script that is in the model artifacts.

Topic		Replies	Views
Save and deploy distilbert model in AWS SageMaker 🤗Transformers	2	2642	April 9, 2021
BERT model size (transformer block number) Beginners	4	3573	August 21, 2020
The model I'm using for QA info extraction is too heavy Beginners	0	254	April 19, 2022
Advice to speed and performance 🤗Transformers	4	7252	December 7, 2020
How to deploy a T5 model to AWS SageMaker for fast inference? Amazon SageMaker	13	5819	February 28, 2022

How do I reduce DistilBERT model size?

Related topics