Using GPT-Neo-125M with ONNX

peterwilli · December 10, 2021, 3:57pm

Hey there,

I’m currently trying to export a GPT-Neo-125M (EleutherAI/gpt-neo-125M · Hugging Face) to run in a ONNX session as it claims to be faster.

I’m trying to host a service that takes a prompt and gives back a reply over CPU inference.

Initial benchmarks were very promising, so I wanted to move ahead.
However, I can’t find any documentation online as to how to get an actual generated text. I spent the rest of the day trying to reverse engineer it based on the source of huggingface and some bits and pieces I found online (which were mostly about other models).

Here’s my steps:

Convert an existing model to ONNX using python3 -m transformers.onnx --model=EleutherAI/gpt-neo-125M onnx/model
Load the model using the following script: Endless-AWSW/test_onnx.py at main · peterwilli/Endless-AWSW · GitHub

The test_onnx was formed based on what I could find, but the tokens responded were super weird:

awsw-dev@3564dfe571f0:/opt/awsw$ python3 test_onnx.py 
Loaded model
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 1 took 1.0165s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 2 took 1.0314s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 3 took 0.7820s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 4 took 1.1293s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 5 took 0.8726s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 6 took 0.9794s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 7 took 1.0277s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 8 took 0.7913s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 9 took 1.2652s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 10 took 1.1225s...

If anyone has any idea, please lmk. DDG/Google were resulting in nothing. All I could find is people using it as a benchmark but no decoding back to tokens. Thanks a lot!

SnoozingSimian · June 25, 2022, 5:46pm

Did you ever get it to work? I am trying something similar and facing the same issues because of the sparse/non-existent documentation. If you ever found out how to get it to work I love to know your solution.

peterwilli · July 5, 2022, 11:21am

Hey there. Yes I did. I can’t give exact instructions but my mod on Github is using it. You can check out the sampler there.

I spent months on getting it to work, probably way too long, but at least I did it!

SnoozingSimian · July 5, 2022, 3:05pm

Could you please point me to the Git repo?

Topic		Replies	Views
Accelerated gpt2-chinese-cluecorpussmall model Beginners	0	414	September 17, 2021
Improving decoding speed by onnx conversion model Beginners	0	244	November 17, 2021
Problem with onnx export and usage Beginners	0	457	June 25, 2022
Using onnx for text-generation with GPT-2 🤗Transformers	4	4123	February 3, 2023
Gpt2 inference with onnx and quantize Beginners	6	3882	February 3, 2021

Using GPT-Neo-125M with ONNX

Related topics