Quantization on customized model


I’m working on a BERT model for a sequence-classification task. However, for my use case I only want the features before the classifier layer. To do this I created a pseudo layer called Identity:

class Identity(nn.Module):
    def __init__(self):
        super(Identity, self).__init__()

    def forward(self, x: Any):
        return x

model.classifier = Identity()

However, there seems to be some issues when trying to quantize this model. If I keep it as a sequence-classification model:

quantizer = ORTQuantizer.from_pretrained(model_path, feature="sequence-classification")

it’ll force create a new classifier:

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at ./models/ and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

If I set the feature as “default”:

quantizer = ORTQuantizer.from_pretrained(model_path, feature="default")
pred_val = ort_model.evaluation_loop(data).predictions

I got the following error

InvalidArgument                           Traceback (most recent call last)
<timed exec> in <module>

~/.local/lib/python3.8/site-packages/optimum/onnxruntime/model.py in evaluation_loop(self, dataset)
     96                 labels = None
     97             onnx_inputs = {key: np.array([inputs[key]]) for key in self.onnx_config.inputs if key in inputs}
---> 98             preds = session.run(self.onnx_named_outputs, onnx_inputs)
     99             if len(preds) == 1:
    100                 preds = preds[0]

~/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py in run(self, output_names, input_feed, run_options)
    190             output_names = [output.name for output in self._outputs_meta]
    191         try:
--> 192             return self._sess.run(output_names, input_feed, run_options)
    193         except C.EPFail as err:
    194             if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid Output Name:last_hidden_state

Can you please let me know what’s the best action to work on a customized model?

Hi @Kofi

If you want to quantize a BERT model for a sequence-classification task without the final classifier layer, you can do as follows (without the pseudo layer) :

model_name = "textattack/bert-base-uncased-SST-2"
quantizer = ORTQuantizer.from_pretrained(model_name, feature="default")

Once exported, the resulting model will have the last_hidden_state as well as the pooled_output as outputs.