AttributeError: 'NoneType' object has no attribute 'pad_token'

Anyone ever experienced such a problem? I am using google colab to test optimising huggingface transformers with optimum, when using such models:

MODEL = “joeddav/xlm-roberta-large-xnli” or
MODEL = “MoritzLaurer/mDeBERTa-v3-base-mnli-xnli”

And loading it to the onnx version:

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
from pathlib import Path

MODEL = “MoritzLaurer/mDeBERTa-v3-base-mnli-xnli”

onnx_path = Path(“onnx”)

load vanilla transformers and convert to onnx

model = ORTModelForSequenceClassification.from_pretrained(MODEL, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL)

save onnx checkpoint and tokenizer

model.save_pretrained(onnx_path)
tokenizer.save_pretrained(onnx_path)

And then applying it to a zero shot classification pipeline:

from transformers import pipeline

TASK = “zero-shot-classification”

vanilla_zero_shot = pipeline(TASK, model, device=0)

print(f"pipeline is loaded on device {vanilla_zero_shot.model.device}")

print(vanilla_zero_shot(purposes[0], industry_sectors, multi_label=True))

I get such erorr:
AttributeError Traceback (most recent call last)
in
5 vanilla_zero_shot = pipeline(TASK, model, device=0)
6 print(f"pipeline is loaded on device {vanilla_zero_shot.model.device}")
----> 7 print(vanilla_zero_shot(purposes[0], industry_sectors, multi_label=True))

9 frames
/usr/local/lib/python3.9/dist-packages/transformers/pipelines/zero_shot_classification.py in _parse_and_tokenize(self, sequence_pairs, padding, add_special_tokens, truncation, **kwargs)
107 “”"
108 return_tensors = self.framework
→ 109 if self.tokenizer.pad_token is None:
110 # Override for tokenizers not supporting padding
111 logger.error(

AttributeError: ‘NoneType’ object has no attribute ‘pad_token’

It work when using regular AutoModelForSequenceClassification and I can’t figure out why it doesn’t work here, perhaps these models are not supported by the onnx runtime?

Hi @gustavv99, with the pipeline from Transformers you’ll have to also provide the tokenizer as follows:

vanilla_zero_shot = pipeline(TASK, model=model, tokenizer=tokenizer, device=0)