I have run a training run on a GPT2LMHead Model, and am now looking to separately load this model and serialize it. Prior to serializing, I just want to run a couple inputs through it to inspect the quality of the inference.
from transformers import GPT2LMHeadModel
train_path = "/etc/runs/checkpoint-10000"
model = AutoModelWithLMHead.from_pretrained(train_path)
If I understand the docs correctly, it should be able to pull the model config as well as the weights from that directory. Inside the directory is a ‘config.json’ which has the following information:
{
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 1,
"embd_pdrop": 0.1,
"eos_token_id": 2,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_embd": 512,
"n_head": 8,
"n_inner": 2048,
"n_layer": 8,
"n_positions": 2048,
"padding_token_id": 0,
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "float32",
"transformers_version": "4.29.2",
"use_cache": true,
"vocab_size": 484
}
However, when running it, I get several errors/warnings:
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input’s
attention_mask
to obtain reliable results.
Settingpad_token_id
toeos_token_id
:2 for open-end generation.
and:
Input length of input_ids is 512, but
max_length
is set to 20. This can lead to unexpected behavior. You should consider increasingmax_new_tokens
.
This is odd, as my config has pad token and n_positions both defined. So, my central question is, should I be re-loading the model in a different manner? Why isn’t it picking up critical pieces of info from my config?
Thank you in advance!