Finetuning GPT2 using Multiple GPU and Trainer

@SUNM I think that should train your model.

can I trust the final model?

One should never trust the final model. One should always evaluate and inspect their model to build confidence that the final model is doing what the developer intended. There are a few ways to do this and the approach can vary across application.

The first thing I would check is whether or not the parameters of the model are being updated as the training proceeds. You can print out the model parameters before the Trainer.train() call and after to see if they have changed. You don’t need to inspect each and every parameter, just a subset of them and see if they differ.

The next thing I would make sure to have is the loss plot (also know as the learning curve). The is a plot of the losses of the model during training for each training step. Here is a detailed article describing this: https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/. What this tells you is whether or not your model is actually optimizing against the loss function. If that curve decreases over the number of training steps, one can say that the model’s predictions are getting better and better with respect to the training data and their labels.

The last thing I would check is whether or not the model is making predictions against the majority class or if it is just randomly guessing. This is more an evaluation method to be able to say the model has learned something beyond majority class prediction or randomly guessing. The DummyClassifier class in scikit learn makes this evaluation easy: sklearn.dummy.DummyClassifier — scikit-learn 1.3.2 documentation

For causal language modeling, a metric that may be of interest is perplexity . This can give you a metric that allows you to evaluate the “goodness” of your model.

how did you call your function?

I’m not sure what you mean here. If you’re referring to how to begin training of the model, it is as you have written in your code. Namely, calling the Trainer.train() method.

If you are referring to how you load your trained model for inference, I can’t recall exactly how that’s done but I know there are several posts on this forum that describe it. I would search for something like “load and call trained model” or “using finetuned model for inference”.

I hope this helps.

1 Like