Problem with pushing quantized model to hub

gospacedev · April 29, 2024, 4:00am

Hi, everyone! I have been trying to upload a quantized model to huggingface using the push_to_hub function but I always receive this error:

AttributeError: 'str' object has no attribute 'data_ptr'

Here’s the code:

from transformers import T5Tokenizer, T5ForConditionalGeneration
from quanto import quantize, freeze, qint8

model_id = "google/flan-t5-base"

quantized_model = T5ForConditionalGeneration.from_pretrained(model_id, low_cpu_mem_usage=True, use_safetensors=True)

quantize(quantized_model, weights=qint8, activaions=None)

freeze(quantized_model)

tokenizer.push_to_hub("flan-t5-base-8bit")
quantized_model.push_to_hub("flan-t5-base-8bit")

Thank you in advance!

nielsr · April 29, 2024, 7:39am

Hi,

Feel free to open an issue on the Quanto repository: GitHub - huggingface/quanto: A pytorch Quantization Toolkit

Dcolinmorgan · October 14, 2024, 4:29am

did you ever solve this? i also get it when trying to .save_pretrained after the freeze

John6666 · October 14, 2024, 4:33am

There are too many causes of AttributeError to pinpoint, but it is said that it can be caused by too old or too new a version of the transformers library.

github.com/huggingface/transformers

save quantized model throws error.

opened 02:03PM - 12 Jul 23 UTC

closed 05:57PM - 12 Jul 23 UTC

nemesis00sam

### System Info ===================================BUG REPORT================…=================== Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes ================================================================================ bin /opt/conda/envs/pytorch/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so CUDA SETUP: CUDA runtime path found: /opt/conda/envs/pytorch/lib/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /opt/conda/envs/pytorch/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so... [2023-07-12 13:52:54,626] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points. - `transformers` version: 4.30.2 - Platform: Linux-5.15.0-1038-aws-x86_64-with-glibc2.31 - Python version: 3.10.12 - Huggingface_hub version: 0.16.2 - Safetensors version: 0.3.1 - PyTorch version (GPU?): 2.0.1 (True) - Tensorflow version (GPU?): not installed (NA) - Flax version (CPU?/GPU?/TPU?): not installed (NA) - Jax version: not installed - JaxLib version: not installed - Using GPU in script?: <fill in> - Using distributed or parallel set-up in script?: <fill in> ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below) ### Reproduction Hi, I'm trying to save quantized model. First attempt didn't work. (I also opened an issue, https://github.com/huggingface/accelerate/issues/1713, to clarify it). I opened this issue because I'm receiving an error message when I run following code. I'm not sure I'm following the right instructions written on https://huggingface.co/docs/transformers/main_classes/quantization. Because model is pushed to hub in documentation. But I expect to save it to local filesystem. Thanks for your help in advance. ``` ### load packages ### import transformers import textwrap from transformers import LlamaTokenizer, LlamaForCausalLM import os import sys from typing import List import accelerate from peft import ( LoraConfig, get_peft_model, get_peft_model_state_dict, prepare_model_for_int8_training, ) #import fire import torch from datasets import load_dataset import pandas as pd import deepspeed DEVICE = "cuda" if torch.cuda.is_available() else "cpu" DEVICE ### load model ### BASE_MODEL = "decapoda-research/llama-7b-hf" model = LlamaForCausalLM.from_pretrained( BASE_MODEL, load_in_8bit=True, torch_dtype=torch.float16, device_map="auto", ) model.save_pretrained(save_directory="quantized_decapoda-research_llama-7b-hf_v2") ``` Error Message: ``` /opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/modeling_utils.py:1709: UserWarning: You are calling `save_pretrained` to a 8-bit converted model you may likely encounter unexepected behaviors. If you want to save 8-bit models, make sure to have `bitsandbytes>0.37.2` installed. warnings.warn( --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[3], line 1 ----> 1 model.save_pretrained(save_directory="quantized_decapoda-research_llama-7b-hf_v2") File /opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/modeling_utils.py:1820, in PreTrainedModel.save_pretrained(self, save_directory, is_main_process, state_dict, save_function, push_to_hub, max_shard_size, safe_serialization, variant, **kwargs) 1817 weights_name = SAFE_WEIGHTS_NAME if safe_serialization else WEIGHTS_NAME 1818 weights_name = _add_variant(weights_name, variant) -> 1820 shards, index = shard_checkpoint(state_dict, max_shard_size=max_shard_size, weights_name=weights_name) 1822 # Clean the folder from a previous save 1823 for filename in os.listdir(save_directory): File /opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/modeling_utils.py:318, in shard_checkpoint(state_dict, max_shard_size, weights_name) 315 storage_id_to_block = {} 317 for key, weight in state_dict.items(): --> 318 storage_id = id_tensor_storage(weight) 320 # If a `weight` shares the same underlying storage as another tensor, we put `weight` in the same `block` 321 if storage_id in storage_id_to_block: File /opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/pytorch_utils.py:290, in id_tensor_storage(tensor) 283 def id_tensor_storage(tensor: torch.Tensor) -> Tuple[torch.device, int, int]: 284 """ 285 Unique identifier to a tensor storage. Multiple different tensors can share the same underlying storage. For 286 example, "meta" tensors all share the same storage, and thus their identifier will all be equal. This identifier is 287 guaranteed to be unique and constant for this tensor's storage during its lifetime. Two tensor storages with 288 non-overlapping lifetimes may have the same id. 289 """ --> 290 return tensor.device, storage_ptr(tensor), storage_size(tensor) AttributeError: 'str' object has no attribute 'device' ``` ### Expected behavior Save quantized model to local filesystem.

Topic		Replies	Views
ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable 🤗Transformers	1	333	May 20, 2024
Pushing a quantized (4bit) model on the Hub 🤗Transformers	9	4246	January 8, 2024
How to push on hub a quantized model 🤗Hub	0	90	July 13, 2024
AttributeError: 'CustomQwen3Model' object has no attribute 'config' 🤗Transformers	1	14	May 16, 2025
Push 4-bit converted model to hub Models	2	2325	October 27, 2023

Problem with pushing quantized model to hub

Related topics