Subclassing a pretrained model for a new objective

I would like to use a pretrained model as an encoder for a new task. It is essentially multiple sequence classification objectives, like in the ...ForSequenceClassification models, but with an output layer for each subtask.

I could just create wrappers around the encoder, but I’d like to subclass from PreTrainedModel to better integrate with the Trainer class. How exactly should I do? Do I need to create a config class as well? I will at least need to supply an extra list or dict to the config telling how many classes each subtask has.


1 Like

You can definitely subclass PretrainedConfig for your custom config and PreTrainedModel for your custom model, then access all the methods of the library.

@sgugger thanks! But in that case what is needed to make methods like from_pretrained work out of the box? I saw that the pretrained model classes have a class attribute called config_class, is setting that enough?

It’s to find the right config in the Transformers library. In your case, you might have to use two steps:

config = CustomConfig.from_pretrained(path_to_folder_with_config_and_weights)
model = CustomModel.from_pretrained(path_to_folder_with_config_and_weights, config)

Ok. But how can I load the pretrained model (i.e., the encoder inside my class)?
I tried doing CustomModel.from_pretrained(path_to_pretrained, additional_config_data), but that ignored all the weights in the checkpoint (name mismatches, I suppose?).

Did you save the corresponding model with save_pretrained?

Nope, I haven’t even fine tuned the model yet :slight_smile:
I’m calling from_pretrained in the encoder directly, after creating the classifier object and before training, but that looks hacky.

I’m not sure what you want to do, but calling from_pretrained on your class with weights saved from another class will not work. If you want to use a pretrained model for part of the your custom model, you should use the from_pretrained method when defining that part of your custom model.

Would the following implementation work as expected (by which I mean utilize the model weights associated with the Huggingface location parameter, etc.)? I couldn’t figure out how to subclass using AutoModelForSequenceClassification and am hoping this is valid.

class PretrainedForSequenceCustom(BertForSequenceClassification):

    def __init__(self, num_classes):
        # initialize
        cfg = AutoConfig.from_pretrained(HUGGINGFACE_LOCATION)

        # redefine bert to use model loaded with AutoModel
        self.bert = AutoModel.from_pretrained(HUGGINGFACE_LOCATION)

        # redefine classifier and re-run post-initialization
        self.classifier = CustomLayer(size_in=cfg.hidden_size, num_classes=num_classes)

    def forward(
            input_ids: Optional[torch.Tensor] = None,
            attention_mask: Optional[torch.Tensor] = None,
            token_type_ids: Optional[torch.Tensor] = None,
            position_ids: Optional[torch.Tensor] = None,
            head_mask: Optional[torch.Tensor] = None,
            inputs_embeds: Optional[torch.Tensor] = None,
            labels: Optional[torch.Tensor] = None,
            output_attentions: Optional[bool] = None,
            output_hidden_states: Optional[bool] = None,
            return_dict: Optional[bool] = None,
        outputs = self.bert(

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

        probas = torch.sigmoid(logits)

        return logits, probas

    def change_bert_grad_mode(self, mode=None):
        for param in self.bert.parameters():
            if mode is None:
                param.requires_grad = not param.requires_grad
                param.requires_grad = mode