Exporting Optimum Pipeline for Triton

changlan · August 20, 2022, 1:46am

Hi,

I wonder is it possible to export the entire optimum pipeline (e.g. generation) for serving on Triton model server? Ideally, the pipeline would include tokenization and decoding.

Thanks!

philschmid · August 20, 2022, 6:40am

Hey @changlan,

No, thats currently not possible you would have to write the pre- & post processing yourself using the PythonModel backend of TRTION

Topic		Replies	Views
Option to load only tokenizer and model configuration into "token-classification" pipeline 🤗Tokenizers	0	781	November 25, 2022
Fundamental newbie questions Beginners	1	1335	December 6, 2020
How does the ONNX exporter work for GenerationModel with `past_key_value`? 🤗Optimum	9	2400	February 17, 2023
:rocket: Optimum Transformers: accelerated NLP pipelines with Infinity speed 🤗Transformers	4	664	March 25, 2022
Cannot export tflite using optimum for a fine-tuned gemma 3 model for task : question answering Models	6	130	May 5, 2025

Exporting Optimum Pipeline for Triton

Related topics