Loading quantized model on CPU only

chanansh · June 1, 2023, 7:19pm

ArcaneBlackwood:

ibcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA_SETUP: Loading binary /home/connor/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/home/connor/.local/lib/python3.10/site-packages/bitsandbytes/cextension.py:43: UserWarning: The installed version of bitsandbytes was compiled
  warn(
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/connor/Workspace/chatgpt/bloom/main.py:9 in <module>                                       │
│                                                                                                  │
│    6 print("Loaded Torch")                                                                       │
│    7 tokenizer = AutoTokenizer.from_pretrained("/mnt/backup1/BLOOM/")                            │
│    8 print("Loaded Tokenizer")                                                                   │
│ ❱  9 model = AutoModelForCausalLM.from_pretrained("/mnt/backup1/BLOOM", device_map={"lm_head"    │
│   10 print("Loaded Model")                                                                       │
│   11                                                                                             │
│   12 prompt = 'Write code for finding the prime number in python ?'                              │
│                                                                                                  │
│ /home/connor/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:471 in │
│ from_pretrained                                                                                  │
│                                                                                                  │
│   468 │   │   │   )                                                                              │
│   469 │   │   elif type(config) in cls._model_mapping.keys():                                    │
│   470 │   │   │   model_class = _get_model_class(config, cls._model_mapping)                     │
│ ❱ 471 │   │   │   return model_class.from_pretrained(                                            │
│   472 │   │   │   │   pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs,   │
│   473 │   │   │   )                                                                              │
│   474 │   │   raise ValueError(                                                                  │
│                                                                                                  │
│ /home/connor/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:2583 in          │
│ from_pretrained                                                                                  │
│                                                                                                  │
│   2580 │   │   │   keep_in_fp32_modules = []                                                     │
│   2581 │   │                                                                                     │
│   2582 │   │   if load_in_8bit:                                                                  │
│ ❱ 2583 │   │   │   from .utils.bitsandbytes import get_keys_to_not_convert, replace_8bit_linear  │
│   2584 │   │   │                                                                                 │
│   2585 │   │   │   load_in_8bit_skip_modules = quantization_config.llm_int8_skip_modules         │
│   2586 │   │   │   load_in_8bit_threshold = quantization_config.llm_int8_threshold               │
│                                                                                                  │
│ /home/connor/.local/lib/python3.10/site-packages/transformers/utils/bitsandbytes.py:7 in         │
│ <module>                                                                                         │
│                                                                                                  │
│     4                                                                                            │
│     5                                                                                            │
│     6 if is_bitsandbytes_available():                                                            │
│ ❱   7 │   import bitsandbytes as bnb                                                             │
│     8 │   import torch                                                                           │
│     9 │   import torch.nn as nn                                                                  │
│    10                                                                                            │
│                                                                                                  │
│ /home/connor/.local/lib/python3.10/site-packages/bitsandbytes/__init__.py:6 in <module>          │
│                                                                                                  │
│    3 # This source code is licensed under the MIT license found in the                           │
│    4 # LICENSE file in the root directory of this source tree.                                   │
│    5                                                                                             │
│ ❱  6 from .autograd._functions import (                                                          │
│    7 │   MatmulLtState,                                                                          │
│    8 │   bmm_cublas,                                                                             │
│    9 │   matmul,                                                                                 │
│                                                                                                  │
│ /home/connor/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:170 in      │
│ <module>                                                                                         │
│                                                                                                  │
│   167                                                                                            │
│   168                                                                                            │
│   169 @dataclass                                                                                 │
│ ❱ 170 class MatmulLtState:                                                                       │
│   171 │   CB = None                                                                              │
│   172 │   CxB = None                                                                             │
│   173 │   SB = None                                                                              │
│                                                                                                  │
│ /home/connor/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:189 in      │
│ MatmulLtState                                                                                    │
│                                                                                                  │
│   186 │   is_training = True                                                                     │
│   187 │   has_fp16_weights = True                                                                │
│   188 │   use_pool = False                                                                       │
│ ❱ 189 │   formatB = F.get_special_format_str()                                                   │
│   190 │                                                                                          │
│   191 │   def reset_grads(self):                                                                 │
│   192 │   │   self.CB = None                                                                     │
│                                                                                                  │
│ /home/connor/.local/lib/python3.10/site-packages/bitsandbytes/functional.py:1684 in              │
│ get_special_format_str                                                                           │
│                                                                                                  │
│   1681                                                                                           │
│   1682                                                                                           │
│   1683 def get_special_format_str():                                                             │
│ ❱ 1684 │   major, minor = torch.cuda.get_device_capability()                                     │
│   1685 │   if major < 7:                                                                         │
│   1686 │   │   print(                                                                            │
│   1687 │   │   │   f"Device with CUDA capability of {major} not supported for 8-bit matmul. Dev  │
│                                                                                                  │
│ /home/connor/.local/lib/python3.10/site-packages/torch/cuda/__init__.py:381 in                   │
│ get_device_capability                                                                            │
│                                                                                                  │
│    378 │   Returns:                                                                              │
│    379 │   │   tuple(int, int): the major and minor cuda capability of the device                │
│    380 │   """                                                                                   │
│ ❱  381 │   prop = get_device_properties(device)                                                  │
│    382 │   return prop.major, prop.minor                                                         │
│    383                                                                                           │
│    384                                                                                           │
│                                                                                                  │
│ /home/connor/.local/lib/python3.10/site-packages/torch/cuda/__init__.py:395 in                   │
│ get_device_properties                                                                            │
│                                                                                                  │
│    392 │   Returns:                                                                              │
│    393 │   │   _CudaDeviceProperties: the properties of the device                               │
│    394 │   """                                                                                   │
│ ❱  395 │   _lazy_init()  # will define _get_device_properties                                    │
│    396 │   device = _get_device_index(device, optional=True)                                     │
│    397 │   if device < 0 or device >= device_count():                                            │
│    398 │   │   raise AssertionError("Invalid device id")                                         │
│                                                                                                  │
│ /home/connor/.local/lib/python3.10/site-packages/torch/cuda/__init__.py:247 in _lazy_init        │
│                                                                                                  │
│    244 │   │   # are found or any other error occurs                                             │
│    245 │   │   if 'CUDA_MODULE_LOADING' not in os.environ:                                       │
│    246 │   │   │   os.environ['CUDA_MODULE_LOADING'] = 'LAZY'                                    │
│ ❱  247 │   │   torch._C._cuda_init()                                                             │
│    248 │   │   # Some of the queued calls may reentrantly call _lazy_init();                     │
│    249 │   │   # we need to just return without initializing in that case.                       │
│    250 │   │   # However, we must not let any *other* threads in!                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Found no NVI

I have a similar issue AssertionError: Torch not compiled with CUDA enabled

Topic		Replies	Views
How to load quantized LLM to CPU only device Intermediate	0	1979	January 28, 2024
Load quantized model in memory Beginners	1	604	December 8, 2023
Does load_in_8bit directly load the model in 8bit? (spoliler, do not seem like it) Beginners	0	1515	July 11, 2023
Fine-tuning with load_in_8bit and inference without load_in_8bit possible? 🤗Transformers	4	24457	August 23, 2022
Want to use CPU for falcon7b Beginners	0	321	June 22, 2023

Loading quantized model on CPU only

Related topics