Hey there,
I’m currently trying to export a GPT-Neo-125M (EleutherAI/gpt-neo-125M · Hugging Face) to run in a ONNX session as it claims to be faster.
I’m trying to host a service that takes a prompt and gives back a reply over CPU inference.
Initial benchmarks were very promising, so I wanted to move ahead.
However, I can’t find any documentation online as to how to get an actual generated text. I spent the rest of the day trying to reverse engineer it based on the source of huggingface and some bits and pieces I found online (which were mostly about other models).
Here’s my steps:
-
Convert an existing model to ONNX using
python3 -m transformers.onnx --model=EleutherAI/gpt-neo-125M onnx/model
-
Load the model using the following script: Endless-AWSW/test_onnx.py at main · peterwilli/Endless-AWSW · GitHub
The test_onnx was formed based on what I could find, but the tokens responded were super weird:
awsw-dev@3564dfe571f0:/opt/awsw$ python3 test_onnx.py
Loaded model
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 1 took 1.0165s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 2 took 1.0314s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 3 took 0.7820s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 4 took 1.1293s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 5 took 0.8726s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 6 took 0.9794s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 7 took 1.0277s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 8 took 0.7913s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 9 took 1.2652s...
Here be dragons... trososequose trose�agososeactosososose�ag her trose�actosososososose
Test run 10 took 1.1225s...
If anyone has any idea, please lmk. DDG/Google were resulting in nothing. All I could find is people using it as a benchmark but no decoding back to tokens. Thanks a lot!