Best practices for uploading a custom model

elya5 · October 13, 2022, 11:17am

Hi everyone,

I built a custom model on top of ´xlm-roberta-large´ using pytorch lightning. The code is something like this:

class Model(pl.LightningModule):                                            
                                                                               
    def __init__(self):                                                        
        super().__init__()                                                     
        self.num_labels = 20 

        self.model = AutoModel.from_pretrained(                                
            'xlm-roberta-large',                                           
            config='xlm-roberta-large'                                         
        )                                                                      
        for param in self.model.parameters():                                  
            param.requires_grad = False                                        
                                                                               
        self.bilstm = torch.nn.LSTM(                                           
            input_size=self.model.config.hidden_size,                                                   
            hidden_size=256,                                                   
            num_layers=1,                                                      
            bidirectional=True,                                                
            batch_first=True,                                                  
            bias=True                                                          
        )                                                                      
        self.classifier = torch.nn.Linear(256, self.num_labels)                     
                                                                               
    def forward(self, input_ids, attention_mask=None, labels=None):            
        output = self.model(input_ids=input_ids,                               
                                       attention_mask=attention_mask)                     
        output, (hn, cn) = self.bilstm(output.last_hidden_state)               
...

Now I wanted to upload the model to huggingface and I managed to successfully do so with this boilerplate code:

class SubjectClassifierConfig(PretrainedConfig):
    def __init__(                               
        self,                                   
        **kwargs,                               
    ):                                          
        super().__init__(**kwargs)              


class SubjectClassifier(PreTrainedModel): 
    config_class = SubjectClassifierConfig
                                          
    def __init__(self, config):           
        super().__init__(config)          
        self.model = Model()           

    def forward(self, *args, **kwargs):           
        return self.model.forward(*args, **kwargs)

However, it’s not clear to me:

What should go into the config?
Should I include the arguments I used for the Tokenizer somewhere? If so, where?
Should I modify the forward method of SubjectClassifier class to already do the tokenization and convert the return value of the model to a class label?
Should I include the code for the model (and training) when uploading? The model weights themselves are not particularly useful without knowing the code I suppose.