Load checkpoint from Trainer

qmin2 · February 13, 2024, 11:34am

class MyModel(nn.Module):
    def __init__(self, model_args, data_args, training_args, lora_config): 
        super().__init__()
        self.model_args = model_args
        self.data_args = data_args
        self.training_args = training_args
        self.device = training_args.device
        self.llama = LlamaForCausalLM.from_pretrained(\
            model_args.model_name_or_path, \
            cache_dir = model_args.cache_dir, \
        )
        self.llama= get_peft_model(self.llama, lora_config)

This is my model code, I customized it.

model = MyModel(model_args, data_args, training_args, lora_config)

trainer = Seq2SeqTrainer(
        model=model,
        args=training_args,
        train_dataset=train_data,
        eval_dataset=valid_data,
        tokenizer = tokenizer,
        data_collator = data_collator,
        compute_metrics=compute_metrics if training_args.predict_with_generate else None,
    )

During the train, I saved a model every 1000 steps.
And I’d like to evaluate this 1000 steps after training by loading checkpoint.

In this case,
model = AutoModel.from_pretrained(“checkpoint_dir”) causes an error and says that there is no config.json.

In the example code at Huggingface transformers, to begin with, the model is defined Huggingface model like GPT2LMHeadModel, which allows model = GPT2LMHeadModel.from_pretrained(“checkpoint_dir”) to work.
This is a different situation from mine(custom model)

How can I load the saved checkpoint model which was defined as a custom model, with Huggingface Trainer to evaluate this model?

Topic		Replies	Views
Loading model from checkpoint after error in training Beginners	9	41648	May 2, 2024
How to load model after running Trainer.save_model? Beginners	3	3152	November 28, 2023
How can I load specific checkpoint of trained model 🤗Transformers	0	612	April 28, 2022
Custom GPT2 Model won't load after training Intermediate	1	1170	September 15, 2021
Saving/Loading custom model build from varying HF models Intermediate	1	1361	March 20, 2023

Load checkpoint from Trainer

Related topics