I am trying to reduce size of 2.4GB mbart model
import torch
from transformers import MBartConfig, MBartForConditionalGeneration
config = MBartConfig.from_pretrained("/content/drive/MyDrive/Translation/modelforquantization/model")
model = MBartForConditionalGeneration.from_pretrained("/content/drive/MyDrive/Translation/modelforquantization/model",config=config)
model.eval()
quantized_model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
print(quantized_model)
torch.save(quantized_model.state_dict(), "/content/drive/MyDrive/Translation/newww_new_quant.bin")'''
> This code reduced the size to 1.53GB but when I tried inference the output was just empty string.