Problem with onnx export and usage

SnoozingSimian · June 25, 2022, 3:19pm

Hello everyone, I have been trying to speed up the GPT-Neo 1.3B model using Onnx, and have been facing significant issues.

I first exported the GPT-Neo 1.3B model using the Causal-LM feature. This created a folder with lots of files and the model.onnx file as well.

Thereafter I tried using the onnx model using onnx-runtime as shown in the this page.

Here is the code I used.

tokenizer = GPT2Tokenizer.from_pretrained(model_name)
ONNX_PROVIDERS = ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

session = rt.InferenceSession("onnx/model.onnx", providers=ONNX_PROVIDERS)

inputs = tokenizer("Using gpt-neo with ONNX Runtime and ", return_tensors="np")
outputs = session.run(output_names=["logits"], input_feed=dict(inputs))

I used the %%time magic in the Jupyter cell and the above code took more than 5 minutes to execute.

After that I used a longer sentence and tried to inference again but the cell never completed execution (I waited for about an hour).

%%time
inputs = tokenizer("Using gpt-neo with ONNX Runtime again and this time with many more words which will put considerable load on the GPU as well as the CPU ", return_tensors="np")
outputs = session.run(output_names=["logits"], input_feed=dict(inputs))

I seem to be missing something, as I am certain this shouldn’t take so long. Could anyone please help me?

Topic		Replies	Views
Using GPT-Neo-125M with ONNX Intermediate	3	1356	July 5, 2022
Improving decoding speed by onnx conversion model Beginners	0	241	November 17, 2021
Reducing latency for GPT-J Beginners	9	2449	December 18, 2022
Got `ONNXRuntimeError` when try to run BART in ONNX format #12851 🤗Transformers	3	1821	November 18, 2021
Exported custom segformer model with pretrained weights ("nvidia/segformer-b0-finetuned-ade-512-512") to ONNX takes longer for inference compare to PyTorch! 🤗Transformers	0	90	June 20, 2024

Problem with onnx export and usage

Related topics