Multi class text classification tutorial: how does he get away with one out_feature on linear layer?

dickdanieljr · August 7, 2020, 11:14pm

I’ve read this tutorial: https://github.com/abhimishra91/transformers-tutorials/blob/master/transformers_multiclass_classification.ipynb

As I see it, the dataset uses 4 labels. I thought that that would imply having 4 out_features on your last linear layer. But when I check his model, he uses just 1.

Please help me to fix my misunderstanding

lewtun · January 22, 2021, 7:56pm

Hi @dickdanieljr, are you referring to this class?

class DistillBERTClass(torch.nn.Module):
    def __init__(self):
        super(DistillBERTClass, self).__init__()
        self.l1 = DistilBertModel.from_pretrained("distilbert-base-uncased")
        self.pre_classifier = torch.nn.Linear(768, 768)
        self.dropout = torch.nn.Dropout(0.3)
        self.classifier = torch.nn.Linear(768, 4)

    def forward(self, input_ids, attention_mask):
        output_1 = self.l1(input_ids=input_ids, attention_mask=attention_mask)
        hidden_state = output_1[0]
        pooler = hidden_state[:, 0]
        pooler = self.pre_classifier(pooler)
        pooler = torch.nn.ReLU()(pooler)
        pooler = self.dropout(pooler)
        output = self.classifier(pooler)
        return output

Here you can see that the last layer is torch.nn.Linear of shape [hidden_dim, num_labels] - or am I missing something?

dickdanieljr · January 22, 2021, 10:19pm

Yeah, that looks ok. I don’t remember the issue I had after such a long time. I described it very poorly. Thank you for the message though!

Topic		Replies	Views
Distilbert-base-multilingual-cased' Beginners	2	580	June 22, 2021
How to implement DistilBertModel for binary text classification problem? Models	0	829	July 10, 2023
Predictions for sequenceclassification task Beginners	2	1256	October 9, 2020
The Best Approach for Weighted Multilabel Classification 🤗Transformers	1	70	January 24, 2025
Multiclass vs Multilabel Beginners	1	2613	August 11, 2020

Multi class text classification tutorial: how does he get away with one out_feature on linear layer?

Related topics