How do I change the classification head of a model?

Hi @oliverguhr Which solution worked for you for binary classification?

Thank you! This fixed my problem too!

It was weird that this didn’t work:
AutoModelForSequenceClassification.from_pretrained(“huggingface/CodeBERTa-language-id”, num_labels=15)

but this did:

config = AutoConfig.from_pretrained(“huggingface/CodeBERTa-language-id”)
config.num_labels = 15
model = AutoModelForSequenceClassification.from_config(config)

Answering to tolgayan, the point is that this gets trained, you’re fine tuning the model.

2 Likes

I think that this is the simplest and intuitive one. Why did nobody like this?

1 Like

I find the solution by @nielsr i.e adding the parameter ignore_mismatched_sizes the most elegant and simple one. It also explains what happens within the code.

3 Likes

Hi @carlosaguayo,

Initializing a model from a config will randomly initialize all the weights of the model. To use the pre-trained weights and add a new, randomly initialized head on top, you would need to do:

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(“huggingface/CodeBERTa-language-id”, num_labels=15, ignore_mismatched_sizes=True)
2 Likes

Simple but Best Solution it solves all everything.

I used the ‘label_names’ argument on my trainer to define which labels I wanted and not the default ‘labels’ choice.

On the trainer, I set ‘num_labels=6’ and ‘ignore_mismatched_sizes=True’ appropiately, however, when doing trainer.train() I get the following error:

TypeError: forward() got an unexpected keyword argument ‘cohesion’
(My 6 labels are [‘cohesion’, ‘syntax’, ‘vocabulary’, ‘phraseology’, ‘grammar’, ‘conventions’])

How would I fix this? Thanks in advance!

EDIT: I fixed this passing down a label matrix as labels instead of using label_names on training args, but if someone knows how to properly use that trainer argument Id appreciate it

In fact, you can custom the pre-trained model by change its layer. For instance, I use BertModelForSequenceClassfication for classification task.

from transformers import BertTokenizer, BertForSequenceClassification, AutoModelForSequenceClassification

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased')
model.to(device)

However, I want to change its classification head, I would do this

cp_model = deepcopy(model)
cp_model.classifier = nn.Sequential(
    (nn.Linear(768, 526)),
    nn.Dropout(0.1),
    nn.Dropout(0.1),
    (nn.Linear(526, 258)),
    nn.ReLU(),
    nn.Dropout(0.1),
    (nn.Linear(258, 2)),
    nn.Softmax()
)
cp_model.to(device)

In fact, you can do directly on the model, but I want to make a copy because I do not want to change any thing on the base model (it just personal).

Hello @nielsr

I hope you are well. I am fine tunning the gpt-neo and to overcome the overfitting I want to increase the drop out to 0.2. if I do this by applying this current command, can I use the model for fine tunning directly with my own dataset?

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("gpt-neo")

model = AutoModelForMaskedLM.from_pretrained("gpt-neo",embed_dropou=0.2,resid_dropout=0.2,attention_dropout=0.2, )