DataCollator not padding as expected

I’m trying to retrain a NER. When I apply DataCollatorForTokenClassification to padding the shape of tokenizer and labels raise a error:

/usr/local/lib/python3.7/dist-packages/transformers/data/data_collator.py in <dictcomp>(.0)
    326             ]
    327 
--> 328         batch = {k: torch.tensor(v, dtype=torch.int64) for k, v in batch.items()}
    329         return batch
    330 

ValueError: expected sequence of length 512 at dim 1 (got 513)
  • Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint,
                                          max_length = 512,truncation = True,
                                          padding = "max_length")

PreTrainedTokenizerFast(name_or_path='pierreguillou/ner-bert-large-cased-pt-lenerbr', vocab_size=29794, model_max_len=1000000000000000019884624838656, is_fast=True, padding_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'})
  • Data
DataCollatorForTokenClassification(
    tokenizer,
    max_length=512,
    padding="max_length",
    label_pad_token_id=-100)

DataCollatorForTokenClassification(tokenizer=PreTrainedTokenizerFast(name_or_path='pierreguillou/ner-bert-large-cased-pt-lenerbr', vocab_size=29794, model_max_len=1000000000000000019884624838656, is_fast=True, padding_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}), padding='max_length', max_length=512, pad_to_multiple_of=None, label_pad_token_id=-100, return_tensors='pt')
  • Item train example tokenized
['[CLS]',
 'analis',
 '##e',
 'da',
 'defesa',
 'da',
 'interessa',
 '##da',
 'pres',
 '##tadora',
 'do',
 'servi',
 '##co',
 'de',
 'comunica',
 '##ca',
 '##o',
 'multi',
 '##mid',
 '##ia',
 '-',
 's',
 '##c',
 '##m',
 'e',
 'servi',
 '##co',
 'telef',
 '##oni',
 '##co',
 'fixo',
 'comu',
 '##tado',
 '-',
 's',
 '##t',
 '##f',
 '##c',
 '[SEP]']