Loading a checkpoint from training GPT2LMHeadModel

I have run a training run on a GPT2LMHead Model, and am now looking to separately load this model and serialize it. Prior to serializing, I just want to run a couple inputs through it to inspect the quality of the inference.

from transformers import GPT2LMHeadModel

train_path = "/etc/runs/checkpoint-10000"
model = AutoModelWithLMHead.from_pretrained(train_path)

If I understand the docs correctly, it should be able to pull the model config as well as the weights from that directory. Inside the directory is a ‘config.json’ which has the following information:

  "activation_function": "gelu_new",
  "architectures": [
  "attn_pdrop": 0.1,
  "bos_token_id": 1,
  "embd_pdrop": 0.1,
  "eos_token_id": 2,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_embd": 512,
  "n_head": 8,
  "n_inner": 2048,
  "n_layer": 8,
  "n_positions": 2048,
  "padding_token_id": 0,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "torch_dtype": "float32",
  "transformers_version": "4.29.2",
  "use_cache": true,
  "vocab_size": 484

However, when running it, I get several errors/warnings:

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input’s attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:2 for open-end generation.


Input length of input_ids is 512, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_new_tokens.

This is odd, as my config has pad token and n_positions both defined. So, my central question is, should I be re-loading the model in a different manner? Why isn’t it picking up critical pieces of info from my config?

Thank you in advance!