Saving standard BertModel english and BertModel multilingual have drastically different sizes?

When I do

bert_config = BertConfig.from_pretrained(‘bert-base-multilingual-cased’)
model = BertModel(bert_config)
torch.save(model.state_dict(), “temp.p”)
print(‘Size (MB):’, os.path.getsize(“temp.p”)/1e6)

and I save this model my model size is around 711 MB. But if I do the same for ‘bert-base-uncased’ or ‘bert-base-cased’ which is actually expected due to 110M parameters. Is this expected/ anyone know what I might be potentially doing wrong.

Hey @mkumar10,
The bert-base-multilingual-cased’ model has large vocab_size (119547), so the embedding matrix is bigger than standard English BERT model. Which is why it takes more memory.

That makes sense and that’s what I suspected but 270 MB is really substantial so just wanted to confirm if that is the only reason?