I’m trying to run sequence classification with a trained Distilibert but I can’t get truncation to work properly and I keep getting
RuntimeError: The size of tensor a (N) must match the size of tensor b (512) at non-singleton dimension 1
.
I can work around it by manually truncating all the documents I pass into the classifier, but that’s really not ideal.
Here is my setup for the pipeline:
model_dir = "./classifier_52522_3"
tokenizer = AutoTokenizer.from_pretrained(
model_dir, model_max_length=512, max_length=512, padding="max_length", truncation=True
)
config = DistilBertConfig.from_pretrained(model_dir)
model = DistilBertForSequenceClassification(config)
pipe = TextClassificationPipeline(
model=model,
tokenizer=tokenizer,
return_all_scores=True
)
I have tried adding the truncation params directly to the saved tokenizer_config.json
file too but no dice.
Thanks!