I train a bert model using pytorch lightning now i want to load it to optimum for inference. How can i do that.
I tried to save it as
torch.save(model.bertmodel.state_dict(), 'bert.pth')
then try to load in optimum as
# The type of quantization to apply
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer = ORTQuantizer.from_pretrained('bert.pth', feature="sequence-classification")
# Quantize the model!
quantizer.export(
onnx_model_path="model.onnx",
onnx_quantized_model_output_path="model-quantized.onnx",
quantization_config=qconfig,
)
the error it throw is
OSError: bert.pth is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
Is there a way to handle this without uploading model to huggingface?