I am writing a custom head for a transformer and was wondering how to add the functionality to correctly load/save the model using the HF functions (from_pretrained()…). Specifically, I would like my model class to work with any transformer from the HF hub.
The use case would be:
- User provides name of model on hub to from_pretrained() → load model and randomly init head layer
- User provides the path to checkpoint → load model and head weights
from transformers import PreTrainedModel, AutoModel, AutoConfig import torch.nn as nn class MyCustomModel(PreTrainedModel): def __init__(self, config, transformer_model_name_or_path, num_classes): super(MyCustomModel, self).__init__(config) self.transformer = AutoModel.from_pretrained(transformer_model_name_or_path, config=config) self.classifier = nn.Sequential( nn.Linear(config.hidden_size, config.hidden_size), nn.ReLU(), nn.Linear(config.hidden_size, num_classes) ) def forward(self, input_ids, attention_mask): outputs = self.transformer(input_ids, attention_mask=attention_mask) pooled_output = outputs.pooler_output logits = self.classifier(pooled_output) return logits def _init_weights(self, module): self.transformer._init_weights(module) from transformers import AutoConfig config = AutoConfig.from_pretrained("sentence-transformers/all-distilroberta-v1") model = MyCustomModel.from_pretrained("sentence-transformers/all-distilroberta-v1", config=config, transformer_model_name_or_path="sentence-transformers/all-distilroberta-v1", num_classes=2)
This code is running but does not load the weights of the distilroberta model correctly. The issues is that I would like to use different models like BERT, RoBERTa which is why I am using AutoClasses but this is then resulting in the weights not being loaded correctly.