Finetuned distilbert-base-multilingual-cased on XNLI
environment:
transformers==4.20.1
optimum==1.3.0
evaluate==0.2.2
I used the provided dynamic quantization API and exported the model-quantized.onnx
, and load the onnx to pipeline to test the accuracy.
It seems like the model-quantized.onnx
is exported without weights… If I load the model.onnx
, the accuracy back to normal. Is there something I missed in this part? How can I measure the accuracy for the quantized model?
from optimum.onnxruntime import ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig
from optimum.onnxruntime import ORTModelForSequenceClassification
from pathlib import Path
from transformers import AutoTokenizer, pipeline, DistilBertTokenizer, DistilBertForSequenceClassification, EvalPrediction
from tqdm import tqdm
import time
from evaluate import evaluator
from datasets import load_dataset, load_metric
import numpy as np
model_path = "/tmp/en_en"
onnx_path = Path('./onnx/')
onnx_path.mkdir(exist_ok=True)
quantizer = ORTQuantizer.from_pretrained(model_path, feature="sequence-classification")
qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=True)
quantizer.export(
onnx_model_path=onnx_path / "model.onnx",
onnx_quantized_model_output_path=onnx_path / "model-quantized.onnx",
quantization_config=qconfig,
)
quantizer.model.config.save_pretrained(onnx_path) # saves config.json
model = ORTModelForSequenceClassification.from_pretrained(onnx_path,file_name="model.onnx")
eval_dataset = load_dataset("xnli", "en", split="validation")
task_evaluator = evaluator("text-classification")
def preprocess_function(example):
example["input"] = {"text": example["premise"], "text_pair": example["hypothesis"]}
return example
eval_dataset = eval_dataset.map(
preprocess_function,
batched=False,
desc="Running tokenizer on train dataset",
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
onnx_classifier = pipeline('text-classification', model=model, tokenizer=tokenizer)
eval_results = task_evaluator.compute(
model_or_pipeline=onnx_classifier,
tokenizer=tokenizer,
metric=load_metric("xnli"),
input_column="input",
label_column="label",
data=eval_dataset,
label_mapping={"LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2},
)
print(eval_results)
model.onnx:
accuracy: 0.7815261044176707,
model-quantized.onnx
accuracy: 0.3333333333333333
The accuracy is the same with the accuracy of the original distilbert-base-multilingual-cased
model on XNLI… Did I miss something? I also tested on optimum==1.4.0, this issue is still there. The eval result for dynamic quantization is bad.
But when I switched to the torch.quantization.quantize_dynamic
, it works fine, as the accuracy just drop a little.
accuracy: 0.7249