How can i output structure of TFGPT2LMHeadModel?

I need to compare GPT2LMHeadModel and TFGPT2LMHeadModel network structure with same parameter,so how can i output structure of TFGPT2LMHeadModel.Model.summary() can achieve this, it just output one layer.

Hello @Orient and welcome to our Forum!

Since TF models in transformers are not fully connected for easy conversion with PyTorch models, the summary related functions in TF don’t really work. (you will not be able to see the internals of the model)
Initialize the model:

from transformers import TFGPT2LMHeadModel
model = TFGPT2LMHeadModel.from_pretrained("gpt2")

When you call summary() you will get:

Model: "tfgpt2lm_head_model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 transformer (TFGPT2MainLaye  multiple                 124439808 
 r)                                                              
                                                                 
=================================================================
Total params: 124,439,808
Trainable params: 124,439,808
Non-trainable params: 0
_________________________________________________________________

Above classes have been instantiated from tf.keras.layers.Layer which has a super cool method called get_config() that returns configuration for custom layers.

All you have to do is:

model.transformer.get_config() (see above that the layer is called transformer) you can see the main attributes.

{'config': {'_name_or_path': 'gpt2',
  'activation_function': 'gelu_new',
  'add_cross_attention': False,
  'architectures': ['GPT2LMHeadModel'],
  'attn_pdrop': 0.1,
  'bad_words_ids': None,
  'bos_token_id': 50256,
  'chunk_size_feed_forward': 0,
  'cross_attention_hidden_size': None,
  'decoder_start_token_id': None,
  'diversity_penalty': 0.0,
  'do_sample': False,
  'early_stopping': False,
  'embd_pdrop': 0.1,
  'encoder_no_repeat_ngram_size': 0,
  'eos_token_id': 50256,
  'exponential_decay_length_penalty': None,
  'finetuning_task': None,
  'forced_bos_token_id': None,
  'forced_eos_token_id': None,
  'id2label': {0: 'LABEL_0', 1: 'LABEL_1'},
  'initializer_range': 0.02,
  'is_decoder': False,
  'is_encoder_decoder': False,
  'label2id': {'LABEL_0': 0, 'LABEL_1': 1},
  'layer_norm_epsilon': 1e-05,
  'length_penalty': 1.0,
  'max_length': 20,
  'min_length': 0,
  'model_type': 'gpt2',
  'n_ctx': 1024,
  'n_embd': 768,
  'n_head': 12,
  'n_inner': None,
  'n_layer': 12,
  'n_positions': 1024,
  'no_repeat_ngram_size': 0,
  'num_beam_groups': 1,
  'num_beams': 1,
  'num_return_sequences': 1,
  'output_attentions': False,
  'output_hidden_states': False,
  'output_scores': False,
  'pad_token_id': None,
  'prefix': None,
  'problem_type': None,
  'pruned_heads': {},
  'remove_invalid_values': False,
  'reorder_and_upcast_attn': False,
  'repetition_penalty': 1.0,
  'resid_pdrop': 0.1,
  'return_dict': True,
  'return_dict_in_generate': False,
  'scale_attn_by_inverse_layer_idx': False,
  'scale_attn_weights': True,
  'sep_token_id': None,
  'summary_activation': None,
  'summary_first_dropout': 0.1,
  'summary_proj_to_labels': True,
  'summary_type': 'cls_index',
  'summary_use_proj': True,
  'task_specific_params': {'text-generation': {'do_sample': True,
    'max_length': 50}},
  'temperature': 1.0,
  'tie_encoder_decoder': False,
  'tie_word_embeddings': True,
  'tokenizer_class': None,
  'top_k': 50,
  'top_p': 1.0,
  'torch_dtype': None,
  'torchscript': False,
  'transformers_version': '4.20.1',
  'typical_p': 1.0,
  'use_bfloat16': False,
  'use_cache': True,
  'vocab_size': 50257},
 'dtype': 'float32',
 'name': 'transformer',
 'trainable': True}

So you can investigate the layers and compare.

Thanks for your help, I have a another question is how can i build input to TFGPT2LMHeadModel?If i should shift one elements of inputs to build labels.is the follow code right?

type or paste code here
 def encode_example(ds, limit=-1):
        print(len(ds))
        input_ids_list = []
        attention_maks_list = []
        label_list = []
        for row in ds:
            input_ids_list.append(row["input_ids"][:-1])
            attention_maks_list.append(row["attention_mask"][:-1])
            
            label_list.append([-100 if k == 1 else k for k in row["labels"][1:]])
        return tf.data.Dataset.from_tensor_slices(
            (input_ids_list, attention_maks_list, label_list)).map(map_example_to_dict)