Custom GPT2 Model won't load after training

Environment info

  • transformers version: 4.10.2
  • Platform: Linux-5.11.0-34-generic-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • PyTorch version (GPU?): 1.8.1+cu102 (True)
  • Tensorflow version (GPU?): 2.4.1 (False)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Information

Model I am using GPT2PretrainedModel.

The problem arises when using:

  • [ ] the official example scripts: (give details below)
  • [ x ] my own modified scripts: (give details below)

The tasks I am working on is:

  • [ ] an official GLUE/SQUaD task: (give the name)
  • [ x ] my own task or dataset: (give details below)

The Problem

I was able to train my customly build model but I am not able to load it with the from_pretrained() function. BTW I don’t save the model manually if that is important. The saving is done by the Huggingface-Trainer.
The Error message:

    model = CustomGPTModel.from_pretrained("results/checkpoint-19065", config=config)
File "/home/flo/PycharmProjects/EET2/venv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1325, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
TypeError: __init__() missing 1 required positional argument: 'config'

I load the model like this:

config = AutoConfig.from_pretrained("results/checkpoint-19065")

model = CustomGPTModel.from_pretrained("dbmdz/german-gpt2", config=config)
# custom = CustomGPTModel(model=model, config=config)
training_args = TrainingArguments(
    output_dir='./results',  # output directory
    per_device_train_batch_size=1,  # batch size per device during training
    per_device_eval_batch_size=1,  # batch size for evaluation
    logging_dir='./logs/event/',  # directory for storing logs
)
trainer = Trainer(
    model=model,  # the instantiated 🤗 Transformers model to be trained
    # model=custom,  # the instantiated 🤗 Transformers model to be trained

    args=training_args,  # training arguments, defined above
    compute_metrics=compute_everything,
    )
trainer.predict(test_dataset=test_dataset)

As you can tell from the commented code, I tried a lot of different approaches to no avail.
Other approaches I tried:

config = AutoConfig.from_pretrained("results/checkpoint-19065")
model = CustomGPTModel.from_pretrained("results/checkpoint-19065", config=config)
# or 
config = AutoConfig.from_pretrained("results/checkpoint-19065")
model = CustomGPTModel.from_pretrained("results/checkpoint-19065") 

Anyway the question is How do I load my custom model?
I think it is because of the way I initialize the CustomGPTModel (see below).

The Task / More Information on what I am Doing

I am training the “dbmdz/german-gpt2” on a multilabel-classification task. For this I had to create my own model by subclassing the GPT2PretrainedModel. This is what the model looks like:

class CustomGPTModel(GPT2PreTrainedModel):
    def __init__(self, model, config):
        super(CustomGPTModel, self).__init__(config)
        self.num_labels = config.num_labels
        self.init_weights()

        ### Architecture:
        self.transformer = model
        self.linear1 = nn.Linear(config.n_embd, 256)
        self.score = nn.Linear(256, self.num_labels, bias=False)
        self.dropout = nn.Dropout(p=0.2)
        self.sig = nn.Sigmoid()
        self.relu = nn.ReLU()

        # Model parallel
        self.model_parallel = False
        self.device_map = None
    def forward(self, input_ids=None, past_key_values=None, attention_mask=None,
                token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None,
                labels=None, use_cache=None, output_attentions=None, output_hidden_states=None,
                return_dict=None, ):

        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        
        transformer_outputs = self.transformer(
              input_ids,
              past_key_values=past_key_values,
              attention_mask=attention_mask,
              token_type_ids=token_type_ids,
              position_ids=position_ids,
              head_mask=head_mask,
              inputs_embeds=inputs_embeds,
              use_cache=use_cache,
              output_attentions=output_attentions,
              output_hidden_states=output_hidden_states,
              return_dict=return_dict,
          )
        hidden_states = transformer_outputs[0]  # call model
        hdn_2 = self.linear1(hidden_states)  # first linear
        logits = self.score(self.dropout(self.relu(hdn_2)))  # apply activation/dropout and final layer
        
        if input_ids is not None:
            batch_size, sequence_length = input_ids.shape[:2]
        else:
            batch_size, sequence_length = inputs_embeds.shape[:2]
        
        assert (
                self.config.pad_token_id is not None or batch_size == 1
        ), "Cannot handle batch sizes > 1 if no padding token is defined."
        if self.config.pad_token_id is None:
            sequence_lengths = -1
        else:
            if input_ids is not None:
                sequence_lengths = torch.ne(input_ids, self.config.pad_token_id).sum(-1) - 1
        pooled_logits = logits[range(batch_size), sequence_lengths]
        loss = None
        if labels is not None:
            loss_fct = BCEWithLogitsLoss()
            loss = loss_fct(pooled_logits.view(-1, self.num_labels), labels.view(-1, self.num_labels))
            return (loss, pooled_logits)
        else:
            return logits

Here I initialize the model for training:

training_args = TrainingArguments(
        output_dir='./results',  # output directory
        num_train_epochs=10,  # total number of training epochs
        per_device_train_batch_size=1,  # batch size per device during training
        per_device_eval_batch_size=1,  # batch size for evaluation

        warmup_steps=500,  # number of warmup steps for learning rate scheduler
        weight_decay=0.01,  # strength of weight decay
        logging_dir='./logs/event/',  # directory for storing logs
        logging_steps=1000,
        load_best_model_at_end=True,
        evaluation_strategy="epoch",  # Evaluation is done (and logged) every eval_steps
        save_strategy="epoch",

        # logging_first_step = True,
        do_eval=True,
    )
trainer = Trainer(
        model=custom_gpt2,                         # the instantiated 🤗 Transformers model to be trained
        args=training_args,                  # training arguments, defined above
        train_dataset=train_dataset,         # training dataset
        eval_dataset=val_dataset,             # evaluation dataset
        compute_metrics=compute_everything,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
    )
trainer.train() 

Expected behavior

The model should get loaded as expected.

I tried to fix it for two days now so I thought creating an issue is the last straw. Hopefully someone can explain what I am doing wrong :sweat_smile: If someone needs more information please tell me!

1 Like

You should either:

  • use the regular torch.load to load the weights of your models
  • or make sure your custom model class subclasses PreTrainedModel and is initalized with a single config, like all Transformers models if you want to use from_pretrained.
1 Like