Onnx export functionality failure for facebook/opt-2.7b with optimum CLI

steps followed:

  1. Load facebook torch model
  2. Apply smooth quantization
  3. Save pytorch model

import torch
import argparse
from transformers import AutoTokenizer, OPTForCausalLM
import os
import smooth
from optimum.onnxruntime import ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig
import onnx
import onnxruntime as ort
import numpy as np
import time
import string
import random

if__name__ == “main”:
parser = argparse.ArgumentParser()
parser.add_argument(“–model_name”, help=“Different OPT model sizes”, type=str, default=“opt-2.7b”)
parser.add_argument(“–onnx”, help=“load model from huggingface, smoothquant and save onnx”, action=‘store_true’)
args = parser.parse_args()

if args.onnx:
    model = OPTForCausalLM.from_pretrained("facebook/" + args.model_name)
    tokenizer = AutoTokenizer.from_pretrained("facebook/" + args.model_name)
    model.tokenizer = tokenizer 
    
    act_scales = torch.load(os.getenv("PYTORCH_PATH") + "/smoothquant/act_scales/" + "%s.pt"%args.model_name)
    smooth.smooth_lm(model, act_scales, 0.5)
    print(model)
    
    prompt = ''.join(random.choices(string.ascii_lowercase + " ", k=model.config.max_position_embeddings))
    #inputs = tokenizer(prompt, return_tensors="pt")  # takes a lot of time
    inputs = tokenizer("What is meaning of life", return_tensors="pt") 
    print(f"inputs: {inputs}")
    print(f"inputs.input_ids: {inputs.input_ids}")
    for key in inputs.keys():
        print(inputs[key].shape)
        print(inputs[key])
    model_out = model(inputs.input_ids)
    print(f"{(model_out.logits.shape)=}")
    out_dir = "./%s_smoothquant"%args.model_name
    if not os.path.exists(out_dir):
        os.makedirs(out_dir)
    model.save_pretrained(out_dir+"/pytorch")
  1. Use optimum cli to convert from pyorch to onnx model

optimum-cli export onnx -m opt-2.7b_smoothquant\pytorch --task text-generation-with-past opt-2.7b_smoothquant\onnx --framework pt --no-post-process

I am seeing this warning in the log “The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05”

please find the below log:

  • logits: max diff = 14.66848087310791
  • present.8.key: max diff = 1.1682510375976562e-05
  • present.18.key: max diff = 7.810757637023926
  • present.18.value: max diff = 3.938399076461792
  • present.19.key: max diff = 5.7716522216796875
  • present.19.value: max diff = 3.3612823486328125
  • present.20.key: max diff = 3.3796701431274414
  • present.20.value: max diff = 2.428762435913086
  • present.21.key: max diff = 3.4451732635498047
  • present.21.value: max diff = 2.12455153465271
  • present.22.key: max diff = 3.651796817779541
  • present.22.value: max diff = 2.4826853275299072
  • present.23.key: max diff = 4.581423759460449
  • present.23.value: max diff = 2.5400819778442383
  • present.24.key: max diff = 10.186495780944824
  • present.24.value: max diff = 5.759485244750977
  • present.25.key: max diff = 10.965536117553711
  • present.25.value: max diff = 6.26738977432251
  • present.26.key: max diff = 2.7324962615966797
  • present.26.value: max diff = 2.6341564655303955
  • present.27.key: max diff = 4.848768711090088
  • present.27.value: max diff = 5.190319061279297
  • present.28.key: max diff = 5.36247444152832
  • present.28.value: max diff = 5.969284534454346
  • present.29.key: max diff = 5.126808166503906
  • present.29.value: max diff = 6.3877997398376465
  • present.30.key: max diff = 4.772220134735107
  • present.30.value: max diff = 5.988787651062012
  • present.31.key: max diff = 5.070601940155029
  • present.31.value: max diff = 6.673184394836426
    The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
  • logits: max diff = 7.611649513244629
  • present.18.key: max diff = 3.4201674461364746
  • present.18.value: max diff = 2.020570755004883
  • present.19.key: max diff = 2.3315491676330566
  • present.19.value: max diff = 1.6240568161010742
  • present.20.key: max diff = 1.7808356285095215
  • present.20.value: max diff = 1.4954586029052734
  • present.21.key: max diff = 1.8739492893218994
  • present.21.value: max diff = 1.7475067377090454
  • present.22.key: max diff = 2.3744964599609375
  • present.22.value: max diff = 1.9387221336364746
  • present.23.key: max diff = 2.560657024383545
  • present.23.value: max diff = 2.3288230895996094
  • present.24.key: max diff = 8.912554740905762
  • present.24.value: max diff = 5.827574253082275
  • present.25.key: max diff = 10.221282958984375
  • present.25.value: max diff = 4.572129726409912
  • present.26.key: max diff = 2.0476551055908203
  • present.26.value: max diff = 2.327301502227783
  • present.27.key: max diff = 3.158855676651001
  • present.27.value: max diff = 3.6903629302978516
  • present.28.key: max diff = 4.461582660675049
  • present.28.value: max diff = 4.333994388580322
  • present.29.key: max diff = 4.099486351013184
  • present.29.value: max diff = 5.0107622146606445
  • present.30.key: max diff = 3.9387893676757812
  • present.30.value: max diff = 5.12119197845459
  • present.31.key: max diff = 3.750337600708008
  • present.31.value: max diff = 4.961455821990967.
    The exported model was saved at: opt-2.7b_smoothquant/onnx