If device is not mps, pipeline throws index out of range in self

When input text length is higher than 512 it throws index out of range in self error as you can see in below:

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = PreTrainedTokenizerFast(tokenizer_file=f"{model_name}/tokenizer.json")
pipeline = TextClassificationPipeline(
    model=model,
    tokenizer=tokenizer
)
result = pipeline(input)
result

But if I add device to pipeline, it runs okay even input still same

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = PreTrainedTokenizerFast(tokenizer_file=f"{model_name}/tokenizer.json")
pipeline = TextClassificationPipeline(
    model=model,
    tokenizer=tokenizer,
    device=torch.device("mps")
)
result = pipeline(input)
result

To make it work without changing device, I need to specify truncation and max_length to make it work. But I have already specified these params in tokenizer.json?

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = PreTrainedTokenizerFast(tokenizer_file=f"{model_name}/tokenizer.json")
pipeline = TextClassificationPipeline(
    model=model,
    tokenizer=tokenizer,
    device=torch.device("mps")
)
result = pipeline(input, truncation=True, max_length=512)
result

I have truncation and padding enabled in tokenizer.json file:

{
  "version": "1.0",
  "truncation": {
    "direction": "Right",
    "max_length": 512,
    "strategy": "LongestFirst",
    "stride": 0
  },
  "padding": {
    "strategy": {
      "Fixed": 512
    },
    "direction": "Right",
    "pad_to_multiple_of": null,
    "pad_id": 0,
    "pad_type_id": 0,
    "pad_token": "[PAD]"
  },
  "added_tokens": [],
  "normalizer": null,
  "pre_tokenizer": {
    "type": "ByteLevel",
    "add_prefix_space": false,
    "trim_offsets": true,
    "use_regex": true
  },
  "post_processor": {
    "type": "BertProcessing",
    "sep": [
      "</s>",
      2
    ],
    "cls": [
      "<s>",
      0
    ]
  },
  "decoder": {
    "type": "ByteLevel",
    "add_prefix_space": true,
    "trim_offsets": true,
    "use_regex": true
  },
}

I am completely new to this. I am wondering why device(“mps”) makes make it work or why truncation doesn’t work as stated in JSON file. Any help would be greatly appreciated :slight_smile: