How can i output structure of TFGPT2LMHeadModel?

Orient · July 19, 2022, 8:07am

I need to compare GPT2LMHeadModel and TFGPT2LMHeadModel network structure with same parameter,so how can i output structure of TFGPT2LMHeadModel.Model.summary() can achieve this, it just output one layer.

merve · July 19, 2022, 11:25am

Hello @Orient and welcome to our Forum!

Since TF models in transformers are not fully connected for easy conversion with PyTorch models, the summary related functions in TF don’t really work. (you will not be able to see the internals of the model)
Initialize the model:

from transformers import TFGPT2LMHeadModel
model = TFGPT2LMHeadModel.from_pretrained("gpt2")

When you call summary() you will get:

Model: "tfgpt2lm_head_model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 transformer (TFGPT2MainLaye  multiple                 124439808 
 r)                                                              
                                                                 
=================================================================
Total params: 124,439,808
Trainable params: 124,439,808
Non-trainable params: 0
_________________________________________________________________

Above classes have been instantiated from tf.keras.layers.Layer which has a super cool method called get_config() that returns configuration for custom layers.

All you have to do is:

model.transformer.get_config() (see above that the layer is called transformer) you can see the main attributes.

{'config': {'_name_or_path': 'gpt2',
  'activation_function': 'gelu_new',
  'add_cross_attention': False,
  'architectures': ['GPT2LMHeadModel'],
  'attn_pdrop': 0.1,
  'bad_words_ids': None,
  'bos_token_id': 50256,
  'chunk_size_feed_forward': 0,
  'cross_attention_hidden_size': None,
  'decoder_start_token_id': None,
  'diversity_penalty': 0.0,
  'do_sample': False,
  'early_stopping': False,
  'embd_pdrop': 0.1,
  'encoder_no_repeat_ngram_size': 0,
  'eos_token_id': 50256,
  'exponential_decay_length_penalty': None,
  'finetuning_task': None,
  'forced_bos_token_id': None,
  'forced_eos_token_id': None,
  'id2label': {0: 'LABEL_0', 1: 'LABEL_1'},
  'initializer_range': 0.02,
  'is_decoder': False,
  'is_encoder_decoder': False,
  'label2id': {'LABEL_0': 0, 'LABEL_1': 1},
  'layer_norm_epsilon': 1e-05,
  'length_penalty': 1.0,
  'max_length': 20,
  'min_length': 0,
  'model_type': 'gpt2',
  'n_ctx': 1024,
  'n_embd': 768,
  'n_head': 12,
  'n_inner': None,
  'n_layer': 12,
  'n_positions': 1024,
  'no_repeat_ngram_size': 0,
  'num_beam_groups': 1,
  'num_beams': 1,
  'num_return_sequences': 1,
  'output_attentions': False,
  'output_hidden_states': False,
  'output_scores': False,
  'pad_token_id': None,
  'prefix': None,
  'problem_type': None,
  'pruned_heads': {},
  'remove_invalid_values': False,
  'reorder_and_upcast_attn': False,
  'repetition_penalty': 1.0,
  'resid_pdrop': 0.1,
  'return_dict': True,
  'return_dict_in_generate': False,
  'scale_attn_by_inverse_layer_idx': False,
  'scale_attn_weights': True,
  'sep_token_id': None,
  'summary_activation': None,
  'summary_first_dropout': 0.1,
  'summary_proj_to_labels': True,
  'summary_type': 'cls_index',
  'summary_use_proj': True,
  'task_specific_params': {'text-generation': {'do_sample': True,
    'max_length': 50}},
  'temperature': 1.0,
  'tie_encoder_decoder': False,
  'tie_word_embeddings': True,
  'tokenizer_class': None,
  'top_k': 50,
  'top_p': 1.0,
  'torch_dtype': None,
  'torchscript': False,
  'transformers_version': '4.20.1',
  'typical_p': 1.0,
  'use_bfloat16': False,
  'use_cache': True,
  'vocab_size': 50257},
 'dtype': 'float32',
 'name': 'transformer',
 'trainable': True}

So you can investigate the layers and compare.

Orient · July 22, 2022, 2:48am

Thanks for your help, I have a another question is how can i build input to TFGPT2LMHeadModel?If i should shift one elements of inputs to build labels.is the follow code right?

type or paste code here
 def encode_example(ds, limit=-1):
        print(len(ds))
        input_ids_list = []
        attention_maks_list = []
        label_list = []
        for row in ds:
            input_ids_list.append(row["input_ids"][:-1])
            attention_maks_list.append(row["attention_mask"][:-1])
            
            label_list.append([-100 if k == 1 else k for k in row["labels"][1:]])
        return tf.data.Dataset.from_tensor_slices(
            (input_ids_list, attention_maks_list, label_list)).map(map_example_to_dict)

Topic		Replies	Views
Python nlp transformers library understanding the methods/functions/properties Beginners	0	560	December 29, 2021
TF transformers model inputs and outputs showing none? 🤗Transformers	1	1142	April 25, 2022
Tensorflow model.summary() doesn't show detail of TFBertModel Beginners	0	1060	August 20, 2020
When i use TFGPT2LMHeadModel, how can i build labels?labels = inputs_ids or labels = inputs_ids[1:] 🤗Transformers	0	365	July 18, 2022
Bert Model with Different Architectures Understanding and Make Custom Model (architecture) Using Transformer Library 🤗Transformers	0	390	November 29, 2022

How can i output structure of TFGPT2LMHeadModel?

Related topics