Hi.
I saw last week the following announcement about Optimum v1.1:
We released 🤗 Optimum v1.1 this week to accelerate Transformers with new [ONNX Runtime](https://www.linkedin.com/company/onnxruntime/) tools:
🏎 Train models up to 30% faster (for models like T5) with ORTTrainer!
DeepSpeed is natively supported out of the box. 😍
🏎 Accelerate inference using static and dynamic quantization with ORTQuantizer!
Get >=99% accuracy of the original FP32 model with speed up up to 3x and size reduction up to 4x
Great!
In order to test it with a T5 model for inference, I went to Optimum github, copied/pasted the Quantization code into a Colab notebook and setup model_checkpoint
and feature
as following:
!python -m pip install optimum[onnxruntime]
!pip install sentencepiece
model_checkpoint = "mrm8488/t5-base-finetuned-question-generation-ap"
feature = "text2text-generation"
from optimum.onnxruntime.configuration import AutoQuantizationConfig
from optimum.onnxruntime import ORTQuantizer
from functools import partial
from datasets import Dataset
# Tokenize the inputs
def preprocess_fn(ex, tokenizer):
return tokenizer(ex["sentence"])
# Create a dataset or load one from the Hub
ds = Dataset.from_dict({"sentence": ["answer: Manuel context: Manuel has created RuPERTa-base with the support of HF-Transformers and Google"]})
# The type of quantization to apply
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer = ORTQuantizer.from_pretrained(model_checkpoint, feature=feature)
# Quantize the model!
quantizer.export(
onnx_model_path="model.onnx",
onnx_quantized_model_output_path="model-quantized.onnx",
quantization_config=qconfig,
)
Then, I ran the code but it gave an error:
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-9-56024da8abbc> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', '# The type of quantization to apply\nqconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)\nquantizer = ORTQuantizer.from_pretrained(model_checkpoint, feature=feature)\n\n# Quantize the model!\nquantizer.export(\n onnx_model_path="model.onnx",\n onnx_quantized_model_output_path="model-quantized.onnx",\n quantization_config=qconfig,\n)')
4 frames
<decorator-gen-53> in time(self, line, cell, local_ns)
<timed exec> in <module>()
/usr/local/lib/python3.7/dist-packages/transformers/onnx/features.py in get_model_class_for_feature(feature, framework)
362 if task not in task_to_automodel:
363 raise KeyError(
--> 364 f"Unknown task: {feature}. "
365 f"Possible values are {list(FeaturesManager._TASKS_TO_AUTOMODELS.values())}"
366 )
KeyError: "Unknown task: text-generation. Possible values are
[<class 'transformers.models.auto.modeling_auto.AutoModel'>,
<class 'transformers.models.auto.modeling_auto.AutoModelForMaskedLM'>,
<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,
<class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>,
<class 'transformers.models.auto.modeling_auto.AutoModelForSequenceClassification'>,
<class 'transformers.models.auto.modeling_auto.AutoModelForTokenClassification'>,
<class 'transformers.models.auto.modeling_auto.AutoModelForMultipleChoice'>,
<class 'transformers.models.auto.modeling_auto.AutoModelForQuestionAnswering'>,
<class 'transformers.models.auto.modeling_auto.AutoModelForImageClassification'>]"
Then, I went to the Optimum documentation but It does look to be updated and I did not find a solution.
Does it mean that Optimum v1.1 can not be used for T5 inference?