GPT2 Training History?


I’m trying to train a GPT2 model (actually GPT2LMHeadModel) using tensorflow2.

In this post the author shows how to train GPT2 in a new language in great detail. By following his guide I was able to train a new non-GPT2 model from scratch. However, after the training is done I couldn’t visualize the training result. Not because of error but I don’t know-how.

after defining the optimizer, loss functions, and the metrics

# defining our optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
# definining our loss function
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# defining our metric which we want to observe
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
# compiling the model
model.compile(optimizer=optimizer, loss=[loss, *[None] * model.config.n_layer], metrics=[metric])

I start Training

num_epoch = 10
history =, epochs=num_epoch)

Now, my question is

  1. It only uses a training dataset. how do I evaluate it? didn’t I need a validation dataset? if yes. how do I feed the validation dataset to

  2. how do I interpret the training history. I want to draw a graph of training loss and accuracy with validation loss and accuracy.

Thank you for your time.