DistilBertModel to sequence classification

re90 · January 23, 2023, 1:00pm

Hi,

I’m trying to create a DistilBertModel model for sequence classification, such that max_position_embeddings=1024 (otherwise I would have used DistilBertForSequenceClassification which is defult to max_position_embeddings=512 )

I define the model in the following way:

 configuration = DistilBertConfig(max_position_embeddings=1024)
 model = DistilBertModel(configuration)

When forwarding an input to the model in the following way:

output = model(ids, attention_mask = mask, return_dict=False)[0]

such that ids.shape = (batch_size, 1024) and mask.shape = (batch_size, 1024) the shape of the output is (batch_size, 1024, 768) .

My question is: What is the best practice to convert this output into a probability vector over the number of labels such the modified output shape would be (batch_size, num_labels)?

I thought of a few options including flattening the current output + an additional FC layer, but I’m not sure this is the best practice.

Thank you in advance

Topic		Replies	Views
DistilBert for Self-Supervision - switch heads for pre-training: MaskedLM and SequenceClassification Beginners	0	223	February 16, 2023
Predictions for sequenceclassification task Beginners	2	1256	October 9, 2020
Fine-tuned model for regression is missing output layer after saving to disk Beginners	4	327	March 8, 2021
Need help to give inputs to my fine tuned model Beginners	1	328	December 7, 2021
Why is using my DistilBERT model for inference so slow? Intermediate	0	920	June 18, 2021

DistilBertModel to sequence classification

Related topics