I’m running the python 3 code below in a jupyter notebook. In the code I’m trying to create an instance of the llama-2-7b-chat model loading weights that have been quantized using gguf. I’m trying to load the weights using the ctransformers module. when I try to create the llm instance from pretrained weights using the code below I’m getting the error message:
"OSError: libcudart.so.12: cannot open shared object file: No such file or directory"
The full code and error message are below. I’m running the code on ubuntu server 18.04 LTS. The list of relevant python modules in the conda virtual environment I’m using are also below.
Can anyone see what the issue is and suggest how to solve it? It looks like there’s possibly an issue with a dependency on cuda 12. It also looks from below like my driver for my gpu uses cuda 11.4 or lower. So can anyone tell if it’s possible for me to run ctransformers with my gpu?
python modules:
torchaudio 2.0.0 py310_cu117 pytorch
torchtriton 2.0.0 py310 pytorch
torchvision 0.15.0 py310_cu117 pytorch
pytorch 2.0.0 py3.10_cuda11.7_cudnn8.5.0_0 pytorch
pytorch-cuda 11.7 h778d358_5 pytorch
pytorch-mutex 1.0 cuda pytorch
ctransformers 0.2.27 pypi_0 pypi
cuda-cudart 11.7.99 0 nvidia
cuda-cupti 11.7.101 0 nvidia
cuda-libraries 11.7.1 0 nvidia
cuda-nvrtc 11.7.99 0 nvidia
cuda-nvtx 11.7.91 0 nvidia
cuda-runtime 11.7.1 0 nvidia
cudatoolkit 10.1.243 h6bb024c_0 nvidia
cudnn 7.6.5 cuda10.1_0 anaconda
code:
import os
import ctransformers
#Set the path to the model file
download_path='/home/username/stuff/username_storage/LLM/llama/gguf/llama-2-7b-chat.Q4_K_M.gguf'
model_path = os.path.join(os.getcwd(), download_path)
#Create the AutoModelForCausalLM class
llm = ctransformers.AutoModelForCausalLM.from_pretrained(model_path, model_type="gguf", gpu_layers=5, threads=24, reset=False, context_length=10000, stream=True,max_new_tokens=256, temperature=0.8, repetition_penalty=1.1)
#Start a conversation loop
error:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Cell In[2], line 11
8 model_path = os.path.join(os.getcwd(), download_path)
9 #Create the AutoModelForCausalLM class
---> 11 llm = ctransformers.AutoModelForCausalLM.from_pretrained(model_path, model_type="gguf", gpu_layers=5, threads=24, reset=False, context_length=10000, stream=True,max_new_tokens=256, temperature=0.8, repetition_penalty=1.1)
File ~/anaconda3/envs/llm_gguf/lib/python3.10/site-packages/ctransformers/hub.py:175, in AutoModelForCausalLM.from_pretrained(cls, model_path_or_repo_id, model_type, model_file, config, lib, local_files_only, revision, hf, **kwargs)
167 elif path_type == "repo":
168 model_path = cls._find_model_path_from_repo(
169 model_path_or_repo_id,
170 model_file,
171 local_files_only=local_files_only,
172 revision=revision,
173 )
--> 175 llm = LLM(
176 model_path=model_path,
177 model_type=model_type,
178 config=config.config,
179 lib=lib,
180 )
181 if not hf:
182 return llm
File ~/anaconda3/envs/llm_gguf/lib/python3.10/site-packages/ctransformers/llm.py:246, in LLM.__init__(self, model_path, model_type, config, lib)
240 raise ValueError(
241 "Unable to detect model type. Please specify a model type using:\n\n"
242 " AutoModelForCausalLM.from_pretrained(..., model_type='...')\n\n"
243 )
244 model_type = "gguf"
--> 246 self._lib = load_library(lib, gpu=config.gpu_layers > 0)
247 self._llm = self._lib.ctransformers_llm_create(
248 model_path.encode(),
249 model_type.encode(),
250 config.to_struct(),
251 )
252 if self._llm is None:
File ~/anaconda3/envs/llm_gguf/lib/python3.10/site-packages/ctransformers/llm.py:126, in load_library(path, gpu)
124 if "cuda" in path:
125 load_cuda()
--> 126 lib = CDLL(path)
128 lib.ctransformers_llm_create.argtypes = [
129 c_char_p, # model_path
130 c_char_p, # model_type
131 ConfigStruct, # config
132 ]
133 lib.ctransformers_llm_create.restype = llm_p
File ~/anaconda3/envs/llm_gguf/lib/python3.10/ctypes/__init__.py:374, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
371 self._FuncPtr = _FuncPtr
373 if handle is None:
--> 374 self._handle = _dlopen(self._name, mode)
375 else:
376 self._handle = handle
OSError: libcudart.so.12: cannot open shared object file: No such file or directory
ran command below in my ubuntu 18.04 LTS server
nvidia-smi
got output below:
Tue Nov 7 17:55:41 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:42:00.0 Off | N/A |
| 0% 33C P8 10W / 260W | 1908MiB / 7974MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 42819 C ...3/envs/new_llm/bin/python 1905MiB |
+-----------------------------------------------------------------------------+