XLMForSequenceClassification classifier layer?

I’m trying to probe a pretrained model of XLMForSequenceClassification. I want to freeze all layers but the last classifying layer. What layer is that for XLMForSequenceClassification? When I call .named_parameters(), the last layer seems to be:

sequence_summary.summary.weight
sequence_summary.summary.bias 

This is unlike using a pretrained BERTForSequenceClassification since the last layer there is explicitly specified as a classifier in its name. What is sequence_summary? Can I assume this is the classifying layer?

Even if I leave this layer unfrozen, I still seem to get the error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

model = BertForSequenceClassification.from_pretrained(‘bert-base-cased’)

When I print the named_parameters of this model, I get “classifier.weight” and “classifier.bias”.
When I just the print the model above, I see the last layer is:
(classifier): Linear(in_features=768, out_features=2, bias=True)


model = XLNetForSequenceClassification.from_pretrained(‘xlnet-base-cased’)
When I print the named_parameters of this model, I get “logits_proj.weight” and “logits_proj.bias”
When I just the print the model above, I see the last layer is:
(logits_proj): Linear(in_features=768, out_features=2, bias=True)

In the model above, the “sequence_summary” you mentioned is actually one level above the logits_proj parameter. Since you see the "sequence_summary’ as the last layer, can you show how you create your model?

Sure, I am using the following code to get the tokenizer and model:

tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-100-1280')
model = XLMForSequenceClassification.from_pretrained('xlm-mlm-100-1280')

You can print the model and you will see the structure:

tokenizer = XLMTokenizer.from_pretrained(‘xlm-mlm-100-1280’)
model = XLMForSequenceClassification.from_pretrained(‘xlm-mlm-100-1280’)
print(model)




(sequence_summary): SequenceSummary(
(summary): Linear(in_features=1280, out_features=2, bias=True)
(activation): Identity()
(first_dropout): Dropout(p=0.1, inplace=False)
(last_dropout): Identity()
)

You can still see we have the linear layer with in_features of 1280 and out_features as 2.

When using ‘bert-nased-cased’ or ‘xlnet-base-cased’, we see the last layer has different names (classifier and logits_prog respectively). Perhaps these names are as from the paper that first spoke about these models…So, when printing the named_parameters, you could also print the model, to see how the last layer is structured and its name - then, use them accordingly.

On your Runtime Error, could you share your code so we can see where the issue occurs?

Hi All,

I am using model = XLNetForSequenceClassification.from_pretrained(‘xlnet-base-cased’)
for text classfication , on what dataset this XLNetForSequenceClassification pretrained

Thanks in advance