Loading finetuned model to generate text

This is a basic question. Suppose I finetune a pretrained GPT2 model on my particular corpus. How do I go about loading that new model and generating some text from that new model?

Great tutorial on generate method which lets you do greedy decoding, sampling, beam search. It has all the details you are looking for

@valhalla Yes that tutorial is fantastic. Thank you for linking it.

My finetuned model inherits from GPT2LMHeadModel. I wanted to check if the best way to load my finetuned model was via from_pretrained() and if my model will be able to generate text from the new corpus it eas finetuned on.

If your finetuned model is just tuned (and not extended with extra layers) then save_pretrained and from_pretrained should work, and may be the easiest.

If you have altered your model (with extra layers) then I think you need to use torch.save and torch.load, and redefine your model configuration before loading, eg

torch.save(model.state_dict(),‘filepath’)

model = ATBertClass() # redefine using your custom model definition
model.load_state_dict(torch.load(‘filepath’))

@rgwatwormhill, I just fine tuned the model and did not alter it.

If the name of that model class was MyGPT2Model and I saved it using Trainer, to load it would simply be the line MyGPT2Model.from_pretrained('/path/to/model/')?

And since it inherits from GPT2LMHeadModel I can call the generate method to generate text like:

import MyGPT2Model
from transformers import GPT2Tokenizer

prompt = 'Huggingface Transformers is fantastic!'
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
encoded_results = tokenizer(prompt, return_tensors='pt')
tok_ids = encoded_results['input_ids']

my_model = MyGPT2Model.from_pretrained('/path/to/model/')
gen_ids = my_model.generate(tok_ids)
decoded_str = tokenizer.decode(gen_ids)

Is this correct?

Hi aclifton314,

Short answer: I’ve no idea - what happens if you try that?

I’ve not tried using Trainer or GPT2 (I’ve been using Bert with native pytorch commands), so I could easily be wrong here. Please ignore anything where you know better.

In the Trainer documentation, it looks like from_pretrained is the correct command to use.

Did you set output_dir in your Trainer training arguments? Have you checked that your trained model exists in your /path/to/model/ directory?

I don’t think you can import your custom model class (I think you will need to include your code to define it).

If you are not altering the shape of the model, do you need a custom model class?

I think you might need to import GPT2LMHeadModel, whether you use it as-is or by inheriting.

I think you might need to use tokenizer.tokenize() or tokenizer.encode_plus() or something like that, rather than just tokenizer()

1 Like

Short answer: I’ve no idea - what happens if you try that?

It looks like it’s working!

Did you set output_dir in your Trainer training arguments? Have you checked that your trained model exists in your /path/to/model/ directory?

Yes I did. The model does exist in the output directory set in the training arguments.

thank you for your answer, but i have a question for you
I validate the model as I train it, and save the model with the highest scores on the validation set using torch.save(model.state_dict(), output_model_file). As shown in the figure below

Then I trained again and loaded the previously saved model instead of training from scratch, but it didn’t work well, which made me feel like it wasn’t saved or loaded successfully?
model.load_state_dict(torch.load(output_model_file))

My model code is here

Hi @mathor,

what learning-rate did you use for the second lot of training?

Were you using similar training data or very different training data?

Did you save the optimizer state_dict as well as the model state_dict?

This post might help: Checkpoint vs model weight

If you load your saved (fine-tuned) model, and do a validation check before you start any more training, what kind of validation accuracy do you get?