Can't save my finetuned model

changminbark · November 9, 2024, 4:00am

Hi everyone, I have the following code below but it’s not saving the model. Can anyone give me suggestions or tell me why it’s not saving it?

import torch
from torch.utils.data import DataLoader
from torch.optim import AdamW
import pandas as pd
from app.utils import data_utils
from datasets import Dataset, load_dataset
from transformers import AutoTokenizer, DistilBertForSequenceClassification, get_scheduler
from tqdm.auto import tqdm
import evaluate

# Fine-tuning DistilBERT for news sentiment analysis (using Kaggle news sentiment analysis) -> 3 labels
# 0 = Negative
# 1 = Neutral
# 2 = Positive
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=3)

# Other test news sentiment
# ds = load_dataset("sara-nabhani/ML-news-sentiment")

# Tokenize Function
def tokenize_function(data):
    return tokenizer(data["text"], padding="max_length", truncation=True)


# https://www.kaggle.com/datasets/clovisdalmolinvieira/news-sentiment-analysis?resource=download
# dataset = load_dataset("yelp_review_full")
# print(dataset)


# Load in dataset and shape it into pandas dataframe
df = data_utils.load_raw_data("kaggle_news_sentiment_analysis.csv")
df["text"] = df["Title"] + ": " + df["Description"]
df = df[["text", "Sentiment"]]
replacements = {"negative": 0, "neutral": 1, "positive": 2}
df["Sentiment"] = df["Sentiment"].map(replacements).fillna(df["Sentiment"])
df.rename(columns={"Sentiment": "labels"}, inplace=True)
print(df)


# Create train and test datasets
train = df.head(3000)
test = df.tail(500)


# Convert from pandas dataframe to huggingface dataframe
train = Dataset.from_pandas(train)
print(train)
test = Dataset.from_pandas(test)
print(test)


# Tokenize the datasets
tokenized_train = train.map(tokenize_function, batched=True)
tokenized_train = tokenized_train.remove_columns(["text"])
tokenized_train.set_format(type="torch")
print(tokenized_train)

tokenized_test = test.map(tokenize_function, batched=True)
tokenized_test = tokenized_test.remove_columns(["text"])
tokenized_test.set_format(type="torch")
print(tokenized_test)


# Create data loaders
train_dataloader = DataLoader(tokenized_train, shuffle=True, batch_size=10)
test_dataloader = DataLoader(tokenized_test, batch_size=10)


# Initialize optimizer
optimizer = AdamW(model.parameters(), lr=5e-5)


# Initialize learning rate scheduler
num_epochs = 3
num_training_steps = num_epochs * len(train_dataloader)
lr_scheduler = get_scheduler(name="linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=num_training_steps)


# Specify device to train on
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)


# Training loop
print("STARTING TRAINING LOOP ==============")
progress_bar = tqdm(range(num_training_steps))

model.train()
for epoch in range(num_epochs):
    for batch in train_dataloader:
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**batch) # Pass batch through model and get output
        loss = outputs.loss # Calculate loss from output
        loss.backward() # Backpropagation using losses to calculate gradient

        optimizer.step() # Update parameters
        lr_scheduler.step() #
        optimizer.zero_grad() # Refresh gradient
        progress_bar.update(1) # Update progress bar

# Save model
model.save_pretrained(save_directory="./saved_models/distilbert")

John6666 · November 9, 2024, 4:05am

This may fix the problem. If it doesn’t, there may be a bug in one of the libraries.

# Save model
model.to("cpu") # Added
model.save_pretrained(save_directory="./saved_models/distilbert")

github.com/huggingface/transformers

model.save_pretrained fails with error when using Pytorch XLA

opened 06:12AM - 12 Mar 24 UTC

closed 12:33PM - 19 Mar 24 UTC

moficodes

### System Info Transformers == 4.38.2 Platform == TPU V4 on GKE Python == 3.…10 ### Who can help? @ArthurZucker ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [x] My own task or dataset (give details below) ### Reproduction I ran some tests on a GKE Cluster with TPU V4 with 4 nodes. https://gist.github.com/moficodes/1492228c80a3c08747a973b519cc7cda This run fails with Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/safetensors/torch.py", line 13, in storage_ptr return tensor.untyped_storage().data_ptr() RuntimeError: Attempted to access the data pointer on an invalid python storage. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "//fsdp.py", line 112, in <module> model.save_pretrained(new_model_id) File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2448, in save_pretrained safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"}) File "/usr/local/lib/python3.10/site-packages/safetensors/torch.py", line 281, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) File "/usr/local/lib/python3.10/site-packages/safetensors/torch.py", line 470, in _flatten shared_pointers = _find_shared_tensors(tensors) File "/usr/local/lib/python3.10/site-packages/safetensors/torch.py", line 72, in _find_shared_tensors if v.device != torch.device("meta") and storage_ptr(v) != 0 and storage_size(v) != 0: File "/usr/local/lib/python3.10/site-packages/safetensors/torch.py", line 17, in storage_ptr return tensor.storage().data_ptr() File "/usr/local/lib/python3.10/site-packages/torch/storage.py", line 956, in data_ptr return self._data_ptr() File "/usr/local/lib/python3.10/site-packages/torch/storage.py", line 960, in _data_ptr return self._untyped_storage.data_ptr() RuntimeError: Attempted to access the data pointer on an invalid python storage. ### Expected behavior Save the model and push to hugging face.

changminbark · November 9, 2024, 5:21am

I’ll try it out after this, but I was wondering if it was possible to train the model with native pytorch and still save it using HuggingFace’s save_pretained() function. Would that also be another possible issue? I currently don’t get any error messages; it just doesn’t save it.

John6666 · November 9, 2024, 5:29am

Usually, if save_pretrained fails, an error message is displayed in many cases, so it smells like a bug…
Basically, what is saved with HF’s save_pretrained is the torch model’s state_dict and the configuration json, so converting a model trained with native torch to HF format is usually not that difficult.
The only problem is when the state_dict keys change.

github.com/pytorch/torchtune

How to save a trained model so it can be loaded with HF `from_pretrained()`?

opened 07:56AM - 22 Apr 24 UTC

calmitchell617

enhancement

I'm finding this repo to be a user friendly, extensible, memory efficient soluti…on for training/fine-tuning models. However, when it comes to inference, there is a usability gap that could be solved by converting the model into a format that can be loaded by HF's [`from_pretrained()`](https://huggingface.co/docs/transformers/v4.40.0/en/model_doc/auto#transformers.AutoModel.from_pretrained) function. The specific thing I want to do is load a model fine-tuned with torchtune into a [Gradio chatbot, complete with token streaming](https://www.gradio.app/guides/creating-a-chatbot-fast#example-using-a-local-open-source-llm-with-hugging-face). I imagine many other downstream tasks would be made easier with this functionality as well. Would it be reasonable to add the following options to the checkpointer? - Save full model weights in such a way that they can be loaded with [`from_pretrained()`](https://huggingface.co/docs/transformers/v4.40.0/en/model_doc/auto#transformers.AutoModel.from_pretrained). - Save a LoRA adapter in such a way that it can be loaded into a model with [`get_peft_model()`](https://huggingface.co/docs/peft/v0.10.0/en/package_reference/peft_model#peft.get_peft_model). If this seems like a valid addition, and isn't a huge lift, I would be happy to give it a try.

changminbark · November 9, 2024, 4:25pm

I found out the problem. I forgot to put return_tensors=“pt”.

system · November 10, 2024, 4:25am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trouble saving and loading a finetuned model Beginners	1	309	July 7, 2024
Multi_class_classification errors when fine-tuning via TrainerAPI Beginners	0	374	February 20, 2023
Simple Save/Load of tokenizer not working 🤗Transformers	2	1663	November 4, 2020
How to save and load fine-tune model 🤗Transformers	4	24701	October 25, 2021
RuntimeError when training: Expected floating point type for target with class probabilities, got Long Beginners	0	702	December 17, 2023

Can't save my finetuned model

Related topics