How do I use a fine-tuned Trainer model for inference correctly?

Hi,

I have used Transformers Trainer to fine-tune a Roberta model for a NER task. Used the below code to create the model and

model = AutoModelForTokenClassification.from_pretrained(model_name)
        hidden_size = model.classifier.in_features
        model.classifier = MultiLayerPerceptronClassifier(hidden_size=hidden_size, 
                                                          num_labels=len(label2idx))
        model.num_labels = len(label2idx)
        model.config.label2id = label2idx
        model.config.id2label = idx2label
        tokenizer = get_tokenizer(model_name)
        data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
              
        trainer = Trainer(model=model,
                          tokenizer=tokenizer,
                          train_dataset=train_dataset,
                          eval_dataset=val_dataset,
                          data_collator=data_collator,
                          args=training_args,
                          compute_metrics=compute_metrics,
                         )
trainer.train()

The above code saves a few checkpoints for the model.

I used a DataCollator during training with parameters padding=True, truncation=True without providing any maximum length. Below is my tokenization code -

    tokenized_inputs = tokenizer(sample['tokens'], is_split_into_words=True, padding=True, truncation=True,
                                return_attention_mask=True, return_tensors='pt')

Moreover, I had added a custom Classifier Head -

RobertaForTokenClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): RobertaIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): RobertaOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (classifier): MultiLayerPerceptronClassifier(
    (dense_layer): Linear(in_features=768, out_features=768, bias=True)
    (activation): GELU(approximate='none')
    (classifier): Linear(in_features=768, out_features=35, bias=True)
  )
)

Now, I want to use this model to do real-time inference (not batch predict). I am facing following challenges while doing this -

  1. Custom Classifier weights are not initialized when I use AutoModelForTokenClassification.from_pretrained(checkpoint_path) even if I use config parameter. How can I load the custom classifier with its weights?
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForTokenClassification: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  1. How should I decide my maximum length for padding and truncation when tokenizing this input?
  2. How can I utilise the power of Trainer.predict() method to do the real-time inference on a single text example?