Finetuning a specific task when pretrained model isn't trained on that specific task? Using the task model vs using the base model

mayanksatnalika · September 14, 2020, 6:45am

I want to fine tune a RobertaForSequenceClassification task on microsoft/codebert-base model. This microsoft/codebert-base model hasn’t been trained for Sequence-Classification task.
Can I load this pre-trained model inside a SequenceClassification function and fine tune it on my dataset?

model = RobertaForSequenceClassification.from_pretrained( "microsoft/codebert-base" )

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at microsoft/codebert-base and are newly initialized: ['classifier.dense.weight', 'classifier.out_proj.weight', 'classifier.out_proj.bias', 'classifier.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

While loading I get this message which is expected as the model isn’t trained on the task and thus would not have the weights.

Can I proceed with fine-tuning this RobertaForSequenceClassification model or would I need to define my own classifier layer on top of RobertaModel and train that?

valhalla · September 14, 2020, 6:51am

Hi @mayanksatnalika
Yes, you can load a pre-trained base model for SequenceClassification.
RobertaForSequenceClassification adds the classification head itself, you won’t need to do that manually. So you can fine-tune it for classification.

mayanksatnalika · September 14, 2020, 6:57am

Thank you @valhalla for the quick reply.

Just trying to understand it, I came across this https://github.com/curiousily/Getting-Things-Done-with-Pytorch/blob/master/08.sentiment-analysis-with-bert.ipynb here a classifier is added on top of base model and trained rather than directly using a …forSequenceClassification task, would it be possible to tell me how the 2 approaches differ or any relevant links on it

valhalla · September 14, 2020, 7:01am

ForSequenceClassification models does pretty much the same. It takes a base model and adds pooler and classfier head on top of it, so you won’t have to do it manually. You can see how it’s done here, its’ pretty easy to follow

mayanksatnalika · September 14, 2020, 7:11am

Thanks a lot

Topic		Replies	Views
BertForMaskedLM on a fine-tuned base model Beginners	2	790	September 7, 2020
Fine-tuned pre-trained Roberta model on different labels 🤗Transformers	0	633	April 7, 2022
RoBERTa from scratch with different vocab vs. fine-tuning Intermediate	9	2227	August 20, 2020
Continue Pre-Training Roberta Intermediate	3	2682	May 18, 2023
Fine tune a saved model with custom tokenizer 🤗Transformers	3	2960	December 15, 2020

Finetuning a specific task when pretrained model isn't trained on that specific task? Using the task model vs using the base model

Related topics