Hi,
I have used Transformers Trainer to fine-tune a Roberta model for a NER task. Used the below code to create the model and
model = AutoModelForTokenClassification.from_pretrained(model_name)
hidden_size = model.classifier.in_features
model.classifier = MultiLayerPerceptronClassifier(hidden_size=hidden_size,
num_labels=len(label2idx))
model.num_labels = len(label2idx)
model.config.label2id = label2idx
model.config.id2label = idx2label
tokenizer = get_tokenizer(model_name)
data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
trainer = Trainer(model=model,
tokenizer=tokenizer,
train_dataset=train_dataset,
eval_dataset=val_dataset,
data_collator=data_collator,
args=training_args,
compute_metrics=compute_metrics,
)
trainer.train()
The above code saves a few checkpoints for the model.
I used a DataCollator during training with parameters padding=True, truncation=True
without providing any maximum length. Below is my tokenization code -
tokenized_inputs = tokenizer(sample['tokens'], is_split_into_words=True, padding=True, truncation=True,
return_attention_mask=True, return_tensors='pt')
Moreover, I had added a custom Classifier Head -
RobertaForTokenClassification(
(roberta): RobertaModel(
(embeddings): RobertaEmbeddings(
(word_embeddings): Embedding(50265, 768, padding_idx=1)
(position_embeddings): Embedding(514, 768, padding_idx=1)
(token_type_embeddings): Embedding(1, 768)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): RobertaEncoder(
(layer): ModuleList(
(0-11): 12 x RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
)
(dropout): Dropout(p=0.1, inplace=False)
(classifier): MultiLayerPerceptronClassifier(
(dense_layer): Linear(in_features=768, out_features=768, bias=True)
(activation): GELU(approximate='none')
(classifier): Linear(in_features=768, out_features=35, bias=True)
)
)
Now, I want to use this model to do real-time inference (not batch predict). I am facing following challenges while doing this -
- Custom Classifier weights are not initialized when I use
AutoModelForTokenClassification.from_pretrained(checkpoint_path)
even if I use config parameter. How can I load the custom classifier with its weights?
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForTokenClassification: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- How should I decide my maximum length for padding and truncation when tokenizing this input?
- How can I utilise the power of Trainer.predict() method to do the real-time inference on a single text example?