Finetuning GPT2 with user defined loss

One caveat of using your own nn.Module with Trainer is is that save function checks for which kind of network is being passed by if is instance(self.model, PreTrainedModel) and if it is not (like nn.Module in this case or many cases if user define their own), the training stops. One thing that I’d like to propose is to have support for both and give user the warning that some functionalities won’t work which any module which inherits from PreTrainedModel provides.

So you’ll have to redefine save also. On top of that AutoModel.from_pretrained won’t directly work if you pass the path, since it expects the saved model to be an instance of PreTrainedModel, so you’ll have to manually use torch.load to load the weights.