Hello
When you print your model summary you will see:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
distilbert (TFDistilBertMai multiple 66362880
nLayer)
dropout_19 (Dropout) multiple 0
classifier (Dense) multiple 2307
=================================================================
Total params: 66,365,187
Trainable params: 66,365,187
Non-trainable params: 0
_________________________________________________________________
(None, None, None)
Above classes have been instantiated from tf.keras.layers.Layer
which has a super cool method called get_config()
that returns configuration for custom layers.
All you have to do is:
model.classifier.get_config()
and you will get
{'activation': 'linear',
'activity_regularizer': None,
'bias_constraint': None,
'bias_initializer': {'class_name': 'Zeros', 'config': {}},
'bias_regularizer': None,
'dtype': 'float32',
'kernel_constraint': None,
'kernel_initializer': {'class_name': 'TruncatedNormal',
'config': {'mean': 0.0, 'seed': None, 'stddev': 0.02}},
'kernel_regularizer': None,
'name': 'classifier',
'trainable': True,
'units': 3,
'use_bias': True}
same for model.distilbert.get_config()
{'config': {'_name_or_path': 'distilbert-base-uncased',
'activation': 'gelu',
'add_cross_attention': False,
'architectures': ['DistilBertForMaskedLM'],
'attention_dropout': 0.1,
'bad_words_ids': None,
'bos_token_id': None,
'chunk_size_feed_forward': 0,
'cross_attention_hidden_size': None,
'decoder_start_token_id': None,
'dim': 768,
'diversity_penalty': 0.0,
'do_sample': False,
'dropout': 0.1,
'early_stopping': False,
'encoder_no_repeat_ngram_size': 0,
'eos_token_id': None,
'exponential_decay_length_penalty': None,
'finetuning_task': None,
'forced_bos_token_id': None,
'forced_eos_token_id': None,
'hidden_dim': 3072,
'id2label': {0: 'LABEL_0', 1: 'LABEL_1', 2: 'LABEL_2'},
'initializer_range': 0.02,
'is_decoder': False,
'is_encoder_decoder': False,
'label2id': {'LABEL_0': 0, 'LABEL_1': 1, 'LABEL_2': 2},
'length_penalty': 1.0,
'max_length': 20,
'max_position_embeddings': 512,
'min_length': 0,
'model_type': 'distilbert',
'n_heads': 12,
'n_layers': 6,
'no_repeat_ngram_size': 0,
'num_beam_groups': 1,
'num_beams': 1,
'num_return_sequences': 1,
'output_attentions': False,
'output_hidden_states': False,
'output_scores': False,
'pad_token_id': 0,
'prefix': None,
'problem_type': None,
'pruned_heads': {},
'qa_dropout': 0.1,
'remove_invalid_values': False,
'repetition_penalty': 1.0,
'return_dict': True,
'return_dict_in_generate': False,
'sep_token_id': None,
'seq_classif_dropout': 0.2,
'sinusoidal_pos_embds': False,
'task_specific_params': None,
'temperature': 1.0,
'tie_encoder_decoder': False,
'tie_weights_': True,
'tie_word_embeddings': True,
'tokenizer_class': None,
'top_k': 50,
'top_p': 1.0,
'torch_dtype': None,
'torchscript': False,
'transformers_version': '4.18.0',
'typical_p': 1.0,
'use_bfloat16': False,
'vocab_size': 30522},
'dtype': 'float32',
'name': 'distilbert',
'trainable': True}
to interrogate the model.