steps followed:
- Load facebook torch model
- Apply smooth quantization
- Save pytorch model
import torch
import argparse
from transformers import AutoTokenizer, OPTForCausalLM
import os
import smooth
from optimum.onnxruntime import ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig
import onnx
import onnxruntime as ort
import numpy as np
import time
import string
import randomif__name__ == âmainâ:
parser = argparse.ArgumentParser()
parser.add_argument(ââmodel_nameâ, help=âDifferent OPT model sizesâ, type=str, default=âopt-2.7bâ)
parser.add_argument(ââonnxâ, help=âload model from huggingface, smoothquant and save onnxâ, action=âstore_trueâ)
args = parser.parse_args()if args.onnx: model = OPTForCausalLM.from_pretrained("facebook/" + args.model_name) tokenizer = AutoTokenizer.from_pretrained("facebook/" + args.model_name) model.tokenizer = tokenizer act_scales = torch.load(os.getenv("PYTORCH_PATH") + "/smoothquant/act_scales/" + "%s.pt"%args.model_name) smooth.smooth_lm(model, act_scales, 0.5) print(model) prompt = ''.join(random.choices(string.ascii_lowercase + " ", k=model.config.max_position_embeddings)) #inputs = tokenizer(prompt, return_tensors="pt") # takes a lot of time inputs = tokenizer("What is meaning of life", return_tensors="pt") print(f"inputs: {inputs}") print(f"inputs.input_ids: {inputs.input_ids}") for key in inputs.keys(): print(inputs[key].shape) print(inputs[key]) model_out = model(inputs.input_ids) print(f"{(model_out.logits.shape)=}") out_dir = "./%s_smoothquant"%args.model_name if not os.path.exists(out_dir): os.makedirs(out_dir) model.save_pretrained(out_dir+"/pytorch")
- Use optimum cli to convert from pyorch to onnx model
optimum-cli export onnx -m opt-2.7b_smoothquant\pytorch --task text-generation-with-past opt-2.7b_smoothquant\onnx --framework pt --no-post-process
I am seeing this warning in the log âThe ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05â
please find the below log:
- logits: max diff = 14.66848087310791
- present.8.key: max diff = 1.1682510375976562e-05
- present.18.key: max diff = 7.810757637023926
- present.18.value: max diff = 3.938399076461792
- present.19.key: max diff = 5.7716522216796875
- present.19.value: max diff = 3.3612823486328125
- present.20.key: max diff = 3.3796701431274414
- present.20.value: max diff = 2.428762435913086
- present.21.key: max diff = 3.4451732635498047
- present.21.value: max diff = 2.12455153465271
- present.22.key: max diff = 3.651796817779541
- present.22.value: max diff = 2.4826853275299072
- present.23.key: max diff = 4.581423759460449
- present.23.value: max diff = 2.5400819778442383
- present.24.key: max diff = 10.186495780944824
- present.24.value: max diff = 5.759485244750977
- present.25.key: max diff = 10.965536117553711
- present.25.value: max diff = 6.26738977432251
- present.26.key: max diff = 2.7324962615966797
- present.26.value: max diff = 2.6341564655303955
- present.27.key: max diff = 4.848768711090088
- present.27.value: max diff = 5.190319061279297
- present.28.key: max diff = 5.36247444152832
- present.28.value: max diff = 5.969284534454346
- present.29.key: max diff = 5.126808166503906
- present.29.value: max diff = 6.3877997398376465
- present.30.key: max diff = 4.772220134735107
- present.30.value: max diff = 5.988787651062012
- present.31.key: max diff = 5.070601940155029
- present.31.value: max diff = 6.673184394836426
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:- logits: max diff = 7.611649513244629
- present.18.key: max diff = 3.4201674461364746
- present.18.value: max diff = 2.020570755004883
- present.19.key: max diff = 2.3315491676330566
- present.19.value: max diff = 1.6240568161010742
- present.20.key: max diff = 1.7808356285095215
- present.20.value: max diff = 1.4954586029052734
- present.21.key: max diff = 1.8739492893218994
- present.21.value: max diff = 1.7475067377090454
- present.22.key: max diff = 2.3744964599609375
- present.22.value: max diff = 1.9387221336364746
- present.23.key: max diff = 2.560657024383545
- present.23.value: max diff = 2.3288230895996094
- present.24.key: max diff = 8.912554740905762
- present.24.value: max diff = 5.827574253082275
- present.25.key: max diff = 10.221282958984375
- present.25.value: max diff = 4.572129726409912
- present.26.key: max diff = 2.0476551055908203
- present.26.value: max diff = 2.327301502227783
- present.27.key: max diff = 3.158855676651001
- present.27.value: max diff = 3.6903629302978516
- present.28.key: max diff = 4.461582660675049
- present.28.value: max diff = 4.333994388580322
- present.29.key: max diff = 4.099486351013184
- present.29.value: max diff = 5.0107622146606445
- present.30.key: max diff = 3.9387893676757812
- present.30.value: max diff = 5.12119197845459
- present.31.key: max diff = 3.750337600708008
- present.31.value: max diff = 4.961455821990967.
The exported model was saved at: opt-2.7b_smoothquant/onnx