I’m trying to probe a pretrained model of XLMForSequenceClassification. I want to freeze all layers but the last classifying layer. What layer is that for XLMForSequenceClassification? When I call .named_parameters(), the last layer seems to be:
This is unlike using a pretrained BERTForSequenceClassification since the last layer there is explicitly specified as a classifier in its name. What is sequence_summary? Can I assume this is the classifying layer?
Even if I leave this layer unfrozen, I still seem to get the error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
model = BertForSequenceClassification.from_pretrained(‘bert-base-cased’)
When I print the named_parameters of this model, I get “classifier.weight” and “classifier.bias”.
When I just the print the model above, I see the last layer is:
(classifier): Linear(in_features=768, out_features=2, bias=True)
model = XLNetForSequenceClassification.from_pretrained(‘xlnet-base-cased’)
When I print the named_parameters of this model, I get “logits_proj.weight” and “logits_proj.bias”
When I just the print the model above, I see the last layer is:
(logits_proj): Linear(in_features=768, out_features=2, bias=True)
In the model above, the “sequence_summary” you mentioned is actually one level above the logits_proj parameter. Since you see the "sequence_summary’ as the last layer, can you show how you create your model?
You can still see we have the linear layer with in_features of 1280 and out_features as 2.
When using ‘bert-nased-cased’ or ‘xlnet-base-cased’, we see the last layer has different names (classifier and logits_prog respectively). Perhaps these names are as from the paper that first spoke about these models…So, when printing the named_parameters, you could also print the model, to see how the last layer is structured and its name - then, use them accordingly.
On your Runtime Error, could you share your code so we can see where the issue occurs?
I am using model = XLNetForSequenceClassification.from_pretrained(‘xlnet-base-cased’)
for text classfication , on what dataset this XLNetForSequenceClassification pretrained