Qwen/Qwen1.5-7B-Chat RuntimeError: The serialized model is larger than the 2GiB ORTModelForCausalLM

from optimum.onnxruntime import ORTModelForCausalLM
base_model_name = “Qwen/Qwen1.5-7B-Chat”

ort_model = ORTModelForCausalLM.from_pretrained(
base_model_name,
use_io_binding=True,
export=True,
)

RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.

I provided file_name arg

ort_model = ORTModelForCausalLM.from_pretrained(
base_model_name,
use_io_binding=True,
export=True,
file_name=‘/qwen_exp/model.onnx’,
)

error:

Traceback (most recent call last):
File “/home/sr/test_qwen_ort.py”, line 11, in
ort_model = ORTModelForCausalLM.from_pretrained(
File “/home/sr//conda_env/anaconda3/envs/vaiq_onnx/lib/python3.9/site-packages/optimum/onnxruntime/modeling_ort.py”, line 737, in from_pretrained
return super().from_pretrained(
File “/home/sr//conda_env/anaconda3/envs/vaiq_onnx/lib/python3.9/site-packages/optimum/modeling_base.py”, line 438, in from_pretrained
return from_pretrained_method(
TypeError: _from_transformers() got an unexpected keyword argument ‘file_name’

1 Like

@regisss I tried the steps as in

1 Like

Perhaps bug?