XLMForSequenceClassification classifier layer?

dj1121 · October 1, 2020, 1:01am

I’m trying to probe a pretrained model of XLMForSequenceClassification. I want to freeze all layers but the last classifying layer. What layer is that for XLMForSequenceClassification? When I call .named_parameters(), the last layer seems to be:

sequence_summary.summary.weight
sequence_summary.summary.bias

This is unlike using a pretrained BERTForSequenceClassification since the last layer there is explicitly specified as a classifier in its name. What is sequence_summary? Can I assume this is the classifying layer?

Even if I leave this layer unfrozen, I still seem to get the error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Karthik12 · October 1, 2020, 9:17am

model = BertForSequenceClassification.from_pretrained(‘bert-base-cased’)

When I print the named_parameters of this model, I get “classifier.weight” and “classifier.bias”.
When I just the print the model above, I see the last layer is:
(classifier): Linear(in_features=768, out_features=2, bias=True)

model = XLNetForSequenceClassification.from_pretrained(‘xlnet-base-cased’)
When I print the named_parameters of this model, I get “logits_proj.weight” and “logits_proj.bias”
When I just the print the model above, I see the last layer is:
(logits_proj): Linear(in_features=768, out_features=2, bias=True)

In the model above, the “sequence_summary” you mentioned is actually one level above the logits_proj parameter. Since you see the "sequence_summary’ as the last layer, can you show how you create your model?

dj1121 · October 1, 2020, 1:08pm

Sure, I am using the following code to get the tokenizer and model:

tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-100-1280')
model = XLMForSequenceClassification.from_pretrained('xlm-mlm-100-1280')

Karthik12 · October 1, 2020, 3:24pm

You can print the model and you will see the structure:

tokenizer = XLMTokenizer.from_pretrained(‘xlm-mlm-100-1280’)
model = XLMForSequenceClassification.from_pretrained(‘xlm-mlm-100-1280’)
print(model)

…
…
…
(sequence_summary): SequenceSummary(
(summary): Linear(in_features=1280, out_features=2, bias=True)
(activation): Identity()
(first_dropout): Dropout(p=0.1, inplace=False)
(last_dropout): Identity()
)

You can still see we have the linear layer with in_features of 1280 and out_features as 2.

When using ‘bert-nased-cased’ or ‘xlnet-base-cased’, we see the last layer has different names (classifier and logits_prog respectively). Perhaps these names are as from the paper that first spoke about these models…So, when printing the named_parameters, you could also print the model, to see how the last layer is structured and its name - then, use them accordingly.

On your Runtime Error, could you share your code so we can see where the issue occurs?

sru · December 10, 2020, 8:46am

Hi All,

I am using model = XLNetForSequenceClassification.from_pretrained(‘xlnet-base-cased’)
for text classfication , on what dataset this XLNetForSequenceClassification pretrained

Thanks in advance

Topic		Replies	Views
XLNetForSqeuenceClassification warnings 🤗Transformers	16	4264	April 3, 2021
Trying to understand XForSequenceClassification heads Intermediate	8	1321	September 24, 2020
XLNetForSequenceClassification 🤗Transformers	27	1214	January 16, 2021
Finetuning Bert to adapt to the newly added class 🤗Transformers	0	81	June 22, 2024
Metrics mismatch between BertForSequenceClassification Class and my custom Bert Classification Beginners	3	943	December 10, 2020

XLMForSequenceClassification classifier layer?

Related topics