Loading finetuned model to generate text

aclifton314 · August 27, 2020, 5:08pm

This is a basic question. Suppose I finetune a pretrained GPT2 model on my particular corpus. How do I go about loading that new model and generating some text from that new model?

valhalla · August 28, 2020, 3:31pm

Great tutorial on generate method which lets you do greedy decoding, sampling, beam search. It has all the details you are looking for

aclifton314 · August 29, 2020, 8:51pm

@valhalla Yes that tutorial is fantastic. Thank you for linking it.

My finetuned model inherits from GPT2LMHeadModel. I wanted to check if the best way to load my finetuned model was via from_pretrained() and if my model will be able to generate text from the new corpus it eas finetuned on.

rgwatwormhill · August 30, 2020, 11:34am

If your finetuned model is just tuned (and not extended with extra layers) then save_pretrained and from_pretrained should work, and may be the easiest.

If you have altered your model (with extra layers) then I think you need to use torch.save and torch.load, and redefine your model configuration before loading, eg

torch.save(model.state_dict(),‘filepath’)

model = ATBertClass() # redefine using your custom model definition
model.load_state_dict(torch.load(‘filepath’))

aclifton314 · September 18, 2020, 7:17am

@rgwatwormhill, I just fine tuned the model and did not alter it.

If the name of that model class was MyGPT2Model and I saved it using Trainer, to load it would simply be the line MyGPT2Model.from_pretrained('/path/to/model/')?

And since it inherits from GPT2LMHeadModel I can call the generate method to generate text like:

import MyGPT2Model
from transformers import GPT2Tokenizer

prompt = 'Huggingface Transformers is fantastic!'
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
encoded_results = tokenizer(prompt, return_tensors='pt')
tok_ids = encoded_results['input_ids']

my_model = MyGPT2Model.from_pretrained('/path/to/model/')
gen_ids = my_model.generate(tok_ids)
decoded_str = tokenizer.decode(gen_ids)

Is this correct?

rgwatwormhill · September 18, 2020, 10:04am

Hi aclifton314,

Short answer: I’ve no idea - what happens if you try that?

I’ve not tried using Trainer or GPT2 (I’ve been using Bert with native pytorch commands), so I could easily be wrong here. Please ignore anything where you know better.

In the Trainer documentation, it looks like from_pretrained is the correct command to use.

Did you set output_dir in your Trainer training arguments? Have you checked that your trained model exists in your /path/to/model/ directory?

I don’t think you can import your custom model class (I think you will need to include your code to define it).

If you are not altering the shape of the model, do you need a custom model class?

I think you might need to import GPT2LMHeadModel, whether you use it as-is or by inheriting.

I think you might need to use tokenizer.tokenize() or tokenizer.encode_plus() or something like that, rather than just tokenizer()

aclifton314 · September 21, 2020, 6:30pm

Short answer: I’ve no idea - what happens if you try that?

It looks like it’s working!

Did you set output_dir in your Trainer training arguments? Have you checked that your trained model exists in your /path/to/model/ directory?

Yes I did. The model does exist in the output directory set in the training arguments.

mathor · October 17, 2020, 4:45am

thank you for your answer, but i have a question for you
I validate the model as I train it, and save the model with the highest scores on the validation set using torch.save(model.state_dict(), output_model_file). As shown in the figure below

Then I trained again and loaded the previously saved model instead of training from scratch, but it didn’t work well, which made me feel like it wasn’t saved or loaded successfully?
model.load_state_dict(torch.load(output_model_file))

My model code is here

rgwatwormhill · October 19, 2020, 9:03am

Hi @mathor,

what learning-rate did you use for the second lot of training?

Were you using similar training data or very different training data?

Did you save the optimizer state_dict as well as the model state_dict?

This post might help: Checkpoint vs model weight

If you load your saved (fine-tuned) model, and do a validation check before you start any more training, what kind of validation accuracy do you get?

Pranavagrl · July 25, 2023, 1:18pm

Hey aclifton314, I am a beginner I was trying to fine-tune model to generate text i will be the great help if you will be able to provide your notebook to me

aclifton314 · July 25, 2023, 4:45pm

@Pranavagrl welcome to the HF Forums!
I’m happy to help out in any way I can. If you wouldn’t mind providing some more details, it would help me out to walk you through finetuning.

Are you wanting to use a model to generate text over a particular corpus that you have?
Which model would you prefer to use?
Have you any previous experience (either with coding or learning) with machine learning and/or NLP?

Pranavagrl · July 26, 2023, 7:29am

Hello @aclitfton314
Yeah, I have a basic understanding of that.

Are you wanting to use a model to generate text over a particular corpus that you have? ==> No
Which model would you prefer to use? ==> Any Model Using This Architecture: LlamaForCausalLM

aclifton314 · August 7, 2023, 5:27pm

Hi @Pranavagrl,

Do you mind describing in more detail what you’re interested in doing?

Topic		Replies	Views
Generate method during finetuning Beginners	6	1941	July 30, 2020
Difference in model prediction before saving and after loafing 🤗Transformers	4	283	January 23, 2025
Finetuning conditional language model generation 🤗Transformers	0	566	September 6, 2021
Fine tune Transformers for text generation 🤗Transformers	11	12043	July 27, 2023
Fine tune text generation model Beginners	0	263	January 16, 2024

Loading finetuned model to generate text

Related topics