Hi @oliverguhr Which solution worked for you for binary classification?
Thank you! This fixed my problem too!
It was weird that this didnât work:
AutoModelForSequenceClassification.from_pretrained(âhuggingface/CodeBERTa-language-idâ, num_labels=15)
but this did:
config = AutoConfig.from_pretrained(âhuggingface/CodeBERTa-language-idâ)
config.num_labels = 15
model = AutoModelForSequenceClassification.from_config(config)
Answering to tolgayan, the point is that this gets trained, youâre fine tuning the model.
I think that this is the simplest and intuitive one. Why did nobody like this?
I find the solution by @nielsr i.e adding the parameter ignore_mismatched_sizes
the most elegant and simple one. It also explains what happens within the code.
Hi @carlosaguayo,
Initializing a model from a config will randomly initialize all the weights of the model. To use the pre-trained weights and add a new, randomly initialized head on top, you would need to do:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(âhuggingface/CodeBERTa-language-idâ, num_labels=15, ignore_mismatched_sizes=True)
Simple but Best Solution it solves all everything.
I used the âlabel_namesâ argument on my trainer to define which labels I wanted and not the default âlabelsâ choice.
On the trainer, I set ânum_labels=6â and âignore_mismatched_sizes=Trueâ appropiately, however, when doing trainer.train() I get the following error:
TypeError: forward() got an unexpected keyword argument âcohesionâ
(My 6 labels are [âcohesionâ, âsyntaxâ, âvocabularyâ, âphraseologyâ, âgrammarâ, âconventionsâ])
How would I fix this? Thanks in advance!
EDIT: I fixed this passing down a label matrix as labels instead of using label_names on training args, but if someone knows how to properly use that trainer argument Id appreciate it
In fact, you can custom the pre-trained model by change its layer. For instance, I use BertModelForSequenceClassfication for classification task.
from transformers import BertTokenizer, BertForSequenceClassification, AutoModelForSequenceClassification
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased')
model.to(device)
However, I want to change its classification head, I would do this
cp_model = deepcopy(model)
cp_model.classifier = nn.Sequential(
(nn.Linear(768, 526)),
nn.Dropout(0.1),
nn.Dropout(0.1),
(nn.Linear(526, 258)),
nn.ReLU(),
nn.Dropout(0.1),
(nn.Linear(258, 2)),
nn.Softmax()
)
cp_model.to(device)
In fact, you can do directly on the model, but I want to make a copy because I do not want to change any thing on the base model (it just personal).
Hello @nielsr
I hope you are well. I am fine tunning the gpt-neo and to overcome the overfitting I want to increase the drop out to 0.2. if I do this by applying this current command, can I use the model for fine tunning directly with my own dataset?
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("gpt-neo")
model = AutoModelForMaskedLM.from_pretrained("gpt-neo",embed_dropou=0.2,resid_dropout=0.2,attention_dropout=0.2, )