Load pytorch trained model via optimum

I train a bert model using pytorch lightning now i want to load it to optimum for inference. How can i do that.
I tried to save it as

torch.save(model.bertmodel.state_dict(), 'bert.pth')

then try to load in optimum as

# The type of quantization to apply
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer = ORTQuantizer.from_pretrained('bert.pth', feature="sequence-classification")

# Quantize the model!
quantizer.export(
    onnx_model_path="model.onnx",
    onnx_quantized_model_output_path="model-quantized.onnx",
    quantization_config=qconfig,
)

the error it throw is

OSError: bert.pth is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

Is there a way to handle this without uploading model to huggingface?

@Talha can you load your model weights with the transformers AutoModelForXXX classes?
If you can pass the “directory” of where the pytorch_model.bin is stored to the from_pretrained method but this directory also needs to current include files for the tokenizer as well.

We are currently working on removing this transformers dependency to make it easier to use with raw onnx checkpoints, but there for you also need a model.onnx file looking at your snippet it seems that you currently only have a pytorch checkpoint

currently i am saving it via

tokenizer.save_pretrained("model")
model.bertmodel.save_pretrained("model")

and its working, but its not allowing me to do modification in last layer of network where i just want to change number of output neurons

EDIT:
I was wrong, it is not saving my pre-trained model but the original model from hugging face

So i saved it this way

import os
tokenizer=AutoTokenizer.from_pretrained(model_name,local_files_only=False)

output_dir = "./model/"

# Step 1: Save a model, configuration and vocabulary that you have fine-tuned

# If we have a distributed model, save only the encapsulated model
# (it was wrapped in PyTorch DistributedDataParallel or DataParallel)
model_to_save = model.module if hasattr(model, 'module') else model

# If we save using the predefined names, we can load using `from_pretrained`
output_model_file = os.path.join(output_dir, 'pytorch_model.bin')
output_config_file = os.path.join(output_dir, 'config.json')

torch.save(model.bertmodel.state_dict(), output_model_file)
model.bertmodel.config.to_json_file(output_config_file)
tokenizer.save_vocabulary(output_dir)
# load again 
#Example for a Bert model
model = AutoModelForSequenceClassification.from_pretrained(output_dir,num_labels=1)
tokenizer = AutoTokenizer.from_pretrained(output_dir)

but when i try reading. i got this error

# The type of quantization to apply
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer = ORTQuantizer.from_pretrained("model", feature="sequence-classification")

# Quantize the model!
quantizer.export(
    onnx_model_path="model.onnx",
    onnx_quantized_model_output_path="model-quantized.onnx",
    quantization_config=qconfig,
)

Error
RuntimeError: Error(s) in loading state_dict for DistilBertForSequenceClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([1, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
size mismatch for classifier.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([2]).

I am looking a way to define number of labels in ORTQuantizer.from_pretrained

can you run inference on your model it seems that you have a miss-match between training inference when saving the model? Are you sure you are saving the model with the classification head? it looks more like you are saving the BERT model without it.

You don’t need to define num_labels after you trained your models. Since those are needed to initialize a classification head for fine-tuning but this should already exist in your model.

My suggestion would be to take a look at how to fine-tune it with transformers and Trainer rather then some custom logic, which might not be 100% compatible. Here is an example: notebooks/text_classification.ipynb at main · huggingface/notebooks · GitHub