How do I make SeqClassifier that accepts multiple sequences?

So suppose I have the following code:

class SeqClassifier(nn.Module):
    def __init__(self, n_classes):
        super(SentimentClassifier, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.drop = nn.Dropout(p=0.3)
        self.out = nn.Linear(self.bert.config.hidden_size, n_classes) 
        #768, n_classes

    
    def forward(self, input_ids, attention_mask):
        out = self.bert(input_ids, attention_mask=attention_mask)
        pooled_output = out.pooler_output
        # [1, 768], I want [num_seq, 768]
        
        output = self.out(self.drop(pooled_output))
        return output

The pooled output refers to an embedding of [CLS] after linear transformation and tanh activation, it’s not equivalent to the embedding that comes out of the last hidden layer of the model. What I want to do is figure out a way that allows me to pass in multiple sentences at once, such that pooled_output becomes [num_seq, 768] and thus the classifier would be able to train on multiple outputs directly.

This is how I process my input at the moment:

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
input_ids = torch.tensor([tokenized['input_ids']], dtype=torch.long)
input_mask = torch.tensor([tokenized['attention_mask']], dtype=torch.long)

Everything works if I only work with one sequence but as soon as I deal with more, I get confused.

One option seems to be to prepend [CLS] tokens to each new sequence, so the tokenizer will automatically add the required ids.
However, the Bert model does not return multiple pooled outputs but still only one.

Any ideas what I’m doing wrong?

1 Like

Okay I figured it out. Unsure if I’m supposed to delete the post or can keep it. But if anyone struggles with this.

The reason why it went wrong was because of Tensor dimensions.

two_input_ids = torch.stack([input_ids, input_ids], dim=1).squeeze()
two_input_masks = torch.stack([input_mask, input_mask], dim=1).squeeze()
out = bert_model(two_input_ids, attention_mask=two_input_masks)
out.pooler_output.shape
# [2, 768], as desired
1 Like