I want to speed up the inference on zero-shot-classification. I am planning to use facebook/bart-large-mnli
model. I want to use onnx to speed it up. The official docs all contain examples on some ‘bert’ model and onnx_transformers is not longer upto date.
When i tried to export the model according to this article, I used the following code
python -m transformers.onnx --model=facebook/bart-large-mnli --features=sequence-classification --atol=1e-4 onnx_zer_shot/
Now the problem is in giving the inputs to session.run – when i manually run HF model – without using pipeline, i use the following input
x = tokenizer.encode(premise, hypothesis, return_tensors='pt', truncation_strategy="only_first")
But this is not working for onnx session run – it expects a dictionary with [‘attention_mask’, ‘input_ids’] keys – which is the same as the HF model but HF model works with the encoded string anyway.
NOTE: nli_model is nli_model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-large-mnli")
Any help is apreciated