When I do
bert_config = BertConfig.from_pretrained(âbert-base-multilingual-casedâ)
model = BertModel(bert_config)
torch.save(model.state_dict(), âtemp.pâ)
print(âSize (MB):â, os.path.getsize(âtemp.pâ)/1e6)
and I save this model my model size is around 711 MB. But if I do the same for âbert-base-uncasedâ or âbert-base-casedâ which is actually expected due to 110M parameters. Is this expected/ anyone know what I might be potentially doing wrong.