How to save my model to use it later

slowturtle · July 19, 2022, 12:37pm

Hello Amazing people,
This is my first post and I am really new to machine learning and Hugginface.

I followed this awesome guide here multilabel Classification with DistilBert

and used my dataset and the results are very good. I am having a hard time know trying to understand how to save the model I trainned and all the artifacts needed to use my model later.

I tried at the end of the tutorial: torch.save(trainer, 'my_model') but I got this error msg:

AttributeError: Can't pickle local object 'get_linear_schedule_with_warmup.<locals>.lr_lambda'

I have the following files saved for each epoch:

config.json
    optimizer.pt
    pytorch_model.bin
    rng_state.pth
    special_tokens_map.json
    tokenizer.json
    tokenizer_config.json
    trainer_state.json
    training_args.bin
    vocab.txt

Can someone kindly guide me how to save this model to later use?
Thank you very much

merve · July 19, 2022, 12:54pm

Hello there,

You can save models with trainer.save_model("path_to_save"). Another cool thing you can do is you can push your model to the Hugging Face Hub as well. I added couple of lines to notebook to show you, here. You can find pushing there.

slowturtle · July 19, 2022, 1:04pm

Thank you very much for helping me Merve. Huge Thanks.
Just one more question if you don’t mind: I’ll now use my model locally at first. You helped me to save all the files I need to load it again.

So to use the same model I save with trainer.save_model(path) I just need to use trainer.load(path)?

Thank you very much

merve · July 19, 2022, 1:20pm

Hello again,

You can simply load the model using the model class’ from_pretrained(model_path) method like below:
(you can either save locally and load from local or push to Hub and load from Hub)

from transformers import BertConfig, BertModel
# if model is on hugging face Hub
model = BertModel.from_pretrained("bert-base-uncased")
# from local folder
model = BertModel.from_pretrained("./test/saved_model/")

Another cool thing you can use is pipeline API, it will make your life much easier . With pipelines, you will not have to deal with internals of the model or tokenizer to infer with the model, you simply give the folder and it will make the model ready to infer for you.

slowturtle · July 19, 2022, 1:28pm

You are amazing merve I’ll try do to this steps now. Let’s see how it goes.
Thank you again

slowturtle · July 19, 2022, 4:31pm

Hello again,

So I followed that tutorial to train my model(using distilert-base-uncased).
saved the model with:

trainer.save_model("./my_model")

and then I loaded the model:

from transformers import DistilBertConfig, DistilBertModel
path = 'path_to_my_model'
model = DistilBertModel.from_pretrained(path)

Now I followed the same tutorial for inference but then I run:

encoding = tokenizer(text, return_tensors="pt")

encoding = {k: v.to(trainer.model.device) for k,v in encoding.items()}
outputs = trainer.model(**encoding)

and then:

logits = outputs.logits raises the followin error:

AttributeError: 'DistilBertModel' object has no attribute 'logits'

How can I fix this step?

Thank you very much

slowturtle · July 19, 2022, 9:52pm

I found the error: instead of
model = DistilBertModel.from_pretrained(path)
I changed to
model = AutoModelForSequenceClassification.from_pretrained(path)

merve · July 20, 2022, 11:24am

@slowturtle Just to avoid confusion for future, the BertModel classes are simply BERT models without classification heads on top, so the heads include classification heads (and thus logit processors).

ishan42d · December 28, 2022, 12:21am

Hi Merve!

I might be late but the tutorial that you have shared is excellent. My only questions is that can the same model be trained for a Multiclass text classification problem as well? If so, what parameters do I need to keep in mind while training this model? and also will this be successful for smaller datasets (<1000 records). It will be great to see if you have a notebook for this problem statement as well that I have just described

Thanks
Ishan

naman-trilogy · June 15, 2023, 3:24pm

Hi!

I run out of CUDA memory when saving a larger model using this. Is there a way I can move a gpu trained model to ‘cpu’ before saving using trainer.save_model(_). Appreciate the help, thanks!

nikospps · August 30, 2023, 1:33pm

Hello. After running a distilbert model, finetuned with my own custom dataset for classification purposes, i try to save the model in a .pth file format (e.g. distilmodel.pth). After training the model using the Trainer from the pytorch library, it saves a couples of archives into a checkpoint output folder, as declared into the Trainer’s arguments.
Any help to convert the checkpoint into a model.pth format file?
Thanks in advance.

capnchat · December 26, 2023, 7:29pm

What if we want to take a base model from HuggingFace, train it, save the fine-tune model, and then train it further? I want to train the model iteratively on subsets of my data so I don’t have to train it all at once because it will take a few weeks to do it all at once and I am afraid it will crash towards the end and waste the experiment, as well as I want to be able to test the output in between subsets of data.

Currently, when I try to load a custom model and tokenizer, though I can generate text with the model no problem, I get the below error when I attempt to train it further:

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

The thing is, this is not an issue when I train the base model model initially, but I have even tried forcing the data to be on the GPU before training and then just get the same error complaining about cuda:0 and cuda:3. I think the data moves to the GPU after training.Train() is called, and all my settings are the same besides the fact I am referencing my locally saved model and tokenizer path instead of the HuggingFace web path. Do I need to push my model to huggingface and then download from there? I looked at the folders that are cached from downloading the model and there are quite a few extra files that are cached aside from the files created when I save the model to a local folder, but any help would be very appreciated.

DeleMike · January 14, 2024, 9:38pm

github.com

huggingface/transformers/blob/main/examples/pytorch/translation/README.md

<!---
Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

## Translation

This directory contains examples for finetuning and evaluating transformers on translation tasks.
Please tag @patil-suraj with any issues/unexpected behaviors, or send a PR!

This file has been truncated. show original

I am using this repo to run a translation task. Especially I’m using it to build a diacritization model. I need to save the model after the process is done. How can I do that?

CUDA_VISIBLE_DEVICES=0 python run_translation.py --model_name_or_path Davlan/oyo-t5-small --do_train --do_eval --source_lang unyo --target_lang dcyo --source_prefix "<unyo2dcyo>: " --train_file data_prep_eng/output_data/bible_train.json --validation_file data_prep_eng/output_data/dev.json --test_file data_prep_eng/output_data/test.json --output_dir oyot5_small_unyo_dcyo_bible --max_source_length 512 --max_target_length 512 --per_device_train_batch_size=24 --per_device_eval_batch_size=24 --num_train_epochs 3 --overwrite_output_dir --predict_with_generate --save_steps 10000 --num_beams 10 --do_predict

Am I missing a flag like --save-model? I need the saved model to be part of the directory.

See what I have now:

cybrtooth · January 26, 2024, 5:56am

Yes, you can. Assuming you are using torch:
DEVICE = “cpu”
#assuming huggingface model
your_model.to(DEVICE)

you can move the model back when loading:

GPU_DEVICE = “cuda” if torch.cuda.is_available() else “cpu”

Greykxu · February 23, 2024, 10:49am

Hi, thanks for the answer. But is there a method or convention to NOT use trainer to save models?
I prefer to finetune my model by training in the traditional pytorch way because it’s more flexiable to add my own creativity. But I find it difficult to save it. The error message says that I shouldn’t use the identical checkpointing as the original model. What does it mean? Is there any method to solve it?

anon6674944 · November 10, 2024, 4:30am

how to save dreams on huggingface and on the blockchain ? You may think i am a dreamer but i am not the only one - Research - Hugging Face Forums

Coalbbb · July 26, 2025, 9:29am

I have a question about saving models. If I use model.save_pretrained(), will it save the original weights that weren’t optimized during training?

Topic		Replies	Views
Saving custom and/or finetuned models without the HUB Beginners	3	1052	March 2, 2022
How to save and load fine-tune model 🤗Transformers	4	24714	October 25, 2021
How can I share a pytorch saved model on huggingFace hub Beginners	0	678	May 4, 2022
Can't save my finetuned model Beginners	5	222	November 9, 2024
Saving a fine-tuned model Beginners	0	384	June 30, 2021

How to save my model to use it later

you can move the model back when loading:

Related topics