I trained a bert-based-uncase AutoModelForSequenceClassification model and found that model inference is at least 2x faster if I comment out padding = ‘max_length’ in the encode step. My understanding is BERT expects a fix length of 512 tokens… doesn’t that imply input must be padded to 512? Please help me understand.
sequence = tokenizer.encode_plus(question,
passage,
max_length = 256,
padding = 'max_length',
truncation = \"longest_first\",
return_tensors=\"pt\")['input_ids'].to(device)