Tokenizer truncation

I’m trying to run sequence classification with a trained Distilibert but I can’t get truncation to work properly and I keep getting

RuntimeError: The size of tensor a (N) must match the size of tensor b (512) at non-singleton dimension 1.

I can work around it by manually truncating all the documents I pass into the classifier, but that’s really not ideal.

Here is my setup for the pipeline:

model_dir = "./classifier_52522_3"
tokenizer = AutoTokenizer.from_pretrained(
    model_dir, model_max_length=512, max_length=512, padding="max_length", truncation=True
)
config = DistilBertConfig.from_pretrained(model_dir)
model = DistilBertForSequenceClassification(config)

pipe = TextClassificationPipeline(
    model=model, 
    tokenizer=tokenizer,
    return_all_scores=True
)

I have tried adding the truncation params directly to the saved tokenizer_config.json file too but no dice.

Thanks!