Environment info
-
transformers
version: 4.10.2 - Platform: Linux-5.11.0-34-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- PyTorch version (GPU?): 1.8.1+cu102 (True)
- Tensorflow version (GPU?): 2.4.1 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Information
Model I am using GPT2PretrainedModel.
The problem arises when using:
- [ ] the official example scripts: (give details below)
- [ x ] my own modified scripts: (give details below)
The tasks I am working on is:
- [ ] an official GLUE/SQUaD task: (give the name)
- [ x ] my own task or dataset: (give details below)
The Problem
I was able to train my customly build model but I am not able to load it with the from_pretrained()
function. BTW I don’t save the model manually if that is important. The saving is done by the Huggingface-Trainer.
The Error message:
model = CustomGPTModel.from_pretrained("results/checkpoint-19065", config=config)
File "/home/flo/PycharmProjects/EET2/venv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1325, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
TypeError: __init__() missing 1 required positional argument: 'config'
I load the model like this:
config = AutoConfig.from_pretrained("results/checkpoint-19065")
model = CustomGPTModel.from_pretrained("dbmdz/german-gpt2", config=config)
# custom = CustomGPTModel(model=model, config=config)
training_args = TrainingArguments(
output_dir='./results', # output directory
per_device_train_batch_size=1, # batch size per device during training
per_device_eval_batch_size=1, # batch size for evaluation
logging_dir='./logs/event/', # directory for storing logs
)
trainer = Trainer(
model=model, # the instantiated 🤗 Transformers model to be trained
# model=custom, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
compute_metrics=compute_everything,
)
trainer.predict(test_dataset=test_dataset)
As you can tell from the commented code, I tried a lot of different approaches to no avail.
Other approaches I tried:
config = AutoConfig.from_pretrained("results/checkpoint-19065")
model = CustomGPTModel.from_pretrained("results/checkpoint-19065", config=config)
# or
config = AutoConfig.from_pretrained("results/checkpoint-19065")
model = CustomGPTModel.from_pretrained("results/checkpoint-19065")
Anyway the question is How do I load my custom model?
I think it is because of the way I initialize the CustomGPTModel (see below).
The Task / More Information on what I am Doing
I am training the “dbmdz/german-gpt2” on a multilabel-classification task. For this I had to create my own model by subclassing the GPT2PretrainedModel. This is what the model looks like:
class CustomGPTModel(GPT2PreTrainedModel):
def __init__(self, model, config):
super(CustomGPTModel, self).__init__(config)
self.num_labels = config.num_labels
self.init_weights()
### Architecture:
self.transformer = model
self.linear1 = nn.Linear(config.n_embd, 256)
self.score = nn.Linear(256, self.num_labels, bias=False)
self.dropout = nn.Dropout(p=0.2)
self.sig = nn.Sigmoid()
self.relu = nn.ReLU()
# Model parallel
self.model_parallel = False
self.device_map = None
def forward(self, input_ids=None, past_key_values=None, attention_mask=None,
token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None,
labels=None, use_cache=None, output_attentions=None, output_hidden_states=None,
return_dict=None, ):
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
transformer_outputs = self.transformer(
input_ids,
past_key_values=past_key_values,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
head_mask=head_mask,
inputs_embeds=inputs_embeds,
use_cache=use_cache,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
hidden_states = transformer_outputs[0] # call model
hdn_2 = self.linear1(hidden_states) # first linear
logits = self.score(self.dropout(self.relu(hdn_2))) # apply activation/dropout and final layer
if input_ids is not None:
batch_size, sequence_length = input_ids.shape[:2]
else:
batch_size, sequence_length = inputs_embeds.shape[:2]
assert (
self.config.pad_token_id is not None or batch_size == 1
), "Cannot handle batch sizes > 1 if no padding token is defined."
if self.config.pad_token_id is None:
sequence_lengths = -1
else:
if input_ids is not None:
sequence_lengths = torch.ne(input_ids, self.config.pad_token_id).sum(-1) - 1
pooled_logits = logits[range(batch_size), sequence_lengths]
loss = None
if labels is not None:
loss_fct = BCEWithLogitsLoss()
loss = loss_fct(pooled_logits.view(-1, self.num_labels), labels.view(-1, self.num_labels))
return (loss, pooled_logits)
else:
return logits
Here I initialize the model for training:
training_args = TrainingArguments(
output_dir='./results', # output directory
num_train_epochs=10, # total number of training epochs
per_device_train_batch_size=1, # batch size per device during training
per_device_eval_batch_size=1, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs/event/', # directory for storing logs
logging_steps=1000,
load_best_model_at_end=True,
evaluation_strategy="epoch", # Evaluation is done (and logged) every eval_steps
save_strategy="epoch",
# logging_first_step = True,
do_eval=True,
)
trainer = Trainer(
model=custom_gpt2, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset, # evaluation dataset
compute_metrics=compute_everything,
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)
trainer.train()
Expected behavior
The model should get loaded as expected.
I tried to fix it for two days now so I thought creating an issue is the last straw. Hopefully someone can explain what I am doing wrong If someone needs more information please tell me!