Generate() without python for inference

mohotmoz · December 26, 2022, 6:43pm

Hello everyone,
As always a huge thank you in advance to the HF team / community for such an amazing set of resources. I am looking at doing some inference in a relatively resource constrained environment. I am familiar with the various model optimizations out there (OpenVINO, ONNX runtimes, etc.) - a huge shout-out again to HF for some really great resources/tutorials there. I would like to be able to run beam search (or potentially multinomial sampling) - right now I am using the amazing generate() function and everything it provides. Since we can get the models “out of python” using ONNX/OpenVINO/etc. - was wondering if there are any best practices / documentation on getting the generate() function “out of python” as well? Looking at C and/or Rust for the current application.
Thank you as always!!

Topic		Replies	Views
Model.generate() is extremely slow while using beam search 🤗Transformers	2	5407	July 24, 2022
Support for exporting generate function to ONNX? 🤗Transformers	7	2309	February 8, 2023
Using the .generate() function with a custom model class Models	0	677	March 3, 2023
How to override model.generate() 🤗Transformers	1	982	October 30, 2023
Language generation with torchscript model? 🤗Transformers	6	2538	November 20, 2021

Generate() without python for inference

Related topics