Optimum & T5 for inference

ierezell · June 6, 2022, 2:08pm

First, thanks a lot for the amazing work, I saw your draft PR (Add seq2seq ort inference by echarlaix · Pull Request #199 · huggingface/optimum · GitHub) and I was so excited to improve the speed of my models that I tried it.

I got the same problem that above, saying that T5 models are unsuported :

 File "/home/pierre/projects/openbook-models/.venv/lib/python3.9/site-packages/optimum/onnxruntime/utils.py", line 106, in check_supported_model_or_raise
    raise KeyError(
KeyError: "t5 model type is not supported yet.

And here is my testing code :


class MultipleText2TextGenerationPipeline(Text2TextGenerationPipeline):
    # This is to be able to return multiple outputs per input (else transformers hardcode it to get the first answer)
    def __call__(self, *args: list[Any], **kwargs: Any):
        result: Text2TextPipelineOutput = super(Text2TextGenerationPipeline, self).__call__(*args, **kwargs)
        flatten_results: list[str] = []
        for result_list in result:
            for result_dict in result_list:
                flatten_results.append(result_dict["generated_text"].replace("question: ", ""))
        return flatten_results

class MySuperT5Model:
    def prepare(self):
        model_id = "mrm8488/t5-base-finetuned-question-generation-ap"
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        save_path = Path(self.weights_cache_folder, "optimum_model")
        save_path.mkdir(exist_ok=True)

        optimizer = ORTOptimizer.from_pretrained(model_id, feature="seq2seq-lm")
        opt_config = OptimizationConfig(optimization_level=99, optimize_for_gpu=True)

        optimizer.export(
            onnx_model_path=save_path / "model.onnx",
            onnx_optimized_model_output_path=save_path / "model-optimized.onnx",
            optimization_config=opt_config,
        )

        optimizer.model.config.save_pretrained(save_path)
        model = ORTModelForSeq2SeqLM.from_pretrained(save_path, file_name="model-optimized.onnx")
        self.onnx_clx = MultipleText2TextGenerationPipeline(model=model, tokenizer=tokenizer, device=0)

    def __call__(self, inputs_texts: list[str]) -> list[list[str]]:
        # default_generator have batch_size=8 and num_return_sequence=3, plus all the rest
        output_texts: list[str] = self.onnx_clx(input_texts, **DEFAULT_GENERATOR_OPTIONS)
        # some_batching_logic... generated_questions=.....
        return generated_questions

I would be willing to spend some time integrating T5 as it’s really important for me to have this model as lightweight and fast as possible.
It should be feasible as T5 is in Export 🤗 Transformers Models.

I know it’s only a draft and I’m really sorry that I used an early work PR However, it works great, bravo! If I can be of any help please let me know.

Thanks in advance,
Have a great day.

Topic		Replies	Views
Transformers.onnx vs optimum.onnxruntime 🤗Optimum	1	1135	September 12, 2022
Boost inference speed of T5 models up to 5X & reduce the model size by 3X 🤗Transformers	2	5605	June 8, 2023
Optimum library optimization and quantization fails 🤗Optimum	8	1547	February 22, 2025
Optimum v1.1.0 breaking problems 🤗Optimum	1	1174	April 26, 2022
Optimum vs Accelerate 🤗Optimum	5	1171	March 2, 2023

Optimum & T5 for inference

Related topics