Running ctransformers with cuda 11.4 or lower

I’m running the python 3 code below in a jupyter notebook. In the code I’m trying to create an instance of the llama-2-7b-chat model loading weights that have been quantized using gguf. I’m trying to load the weights using the ctransformers module. when I try to create the llm instance from pretrained weights using the code below I’m getting the error message:

    "OSError: libcudart.so.12: cannot open shared object file: No such file or directory"

The full code and error message are below. I’m running the code on ubuntu server 18.04 LTS. The list of relevant python modules in the conda virtual environment I’m using are also below.

Can anyone see what the issue is and suggest how to solve it? It looks like there’s possibly an issue with a dependency on cuda 12. It also looks from below like my driver for my gpu uses cuda 11.4 or lower. So can anyone tell if it’s possible for me to run ctransformers with my gpu?

python modules:

    torchaudio                2.0.0               py310_cu117    pytorch
    torchtriton               2.0.0                     py310    pytorch
    torchvision               0.15.0              py310_cu117    pytorch
    pytorch                   2.0.0           py3.10_cuda11.7_cudnn8.5.0_0    pytorch
    pytorch-cuda              11.7                 h778d358_5    pytorch
    pytorch-mutex             1.0                        cuda    pytorch
    ctransformers             0.2.27                   pypi_0    pypi
    cuda-cudart               11.7.99                       0    nvidia
    cuda-cupti                11.7.101                      0    nvidia
    cuda-libraries            11.7.1                        0    nvidia
    cuda-nvrtc                11.7.99                       0    nvidia
    cuda-nvtx                 11.7.91                       0    nvidia
    cuda-runtime              11.7.1                        0    nvidia
    cudatoolkit               10.1.243             h6bb024c_0    nvidia
    cudnn                     7.6.5                cuda10.1_0    anaconda

code:

    import os 
    import ctransformers
    
    #Set the path to the model file
    
    download_path='/home/username/stuff/username_storage/LLM/llama/gguf/llama-2-7b-chat.Q4_K_M.gguf'
    
    model_path = os.path.join(os.getcwd(), download_path)
    #Create the AutoModelForCausalLM class
    
    llm = ctransformers.AutoModelForCausalLM.from_pretrained(model_path, model_type="gguf", gpu_layers=5, threads=24, reset=False, context_length=10000, stream=True,max_new_tokens=256, temperature=0.8, repetition_penalty=1.1)
    #Start a conversation loop

error:

    ---------------------------------------------------------------------------
    OSError                                   Traceback (most recent call last)
    Cell In[2], line 11
          8 model_path = os.path.join(os.getcwd(), download_path)
          9 #Create the AutoModelForCausalLM class
    ---> 11 llm = ctransformers.AutoModelForCausalLM.from_pretrained(model_path, model_type="gguf", gpu_layers=5, threads=24, reset=False, context_length=10000, stream=True,max_new_tokens=256, temperature=0.8, repetition_penalty=1.1)
    
    File ~/anaconda3/envs/llm_gguf/lib/python3.10/site-packages/ctransformers/hub.py:175, in AutoModelForCausalLM.from_pretrained(cls, model_path_or_repo_id, model_type, model_file, config, lib, local_files_only, revision, hf, **kwargs)
        167 elif path_type == "repo":
        168     model_path = cls._find_model_path_from_repo(
        169         model_path_or_repo_id,
        170         model_file,
        171         local_files_only=local_files_only,
        172         revision=revision,
        173     )
    --> 175 llm = LLM(
        176     model_path=model_path,
        177     model_type=model_type,
        178     config=config.config,
        179     lib=lib,
        180 )
        181 if not hf:
        182     return llm
    
    File ~/anaconda3/envs/llm_gguf/lib/python3.10/site-packages/ctransformers/llm.py:246, in LLM.__init__(self, model_path, model_type, config, lib)
        240         raise ValueError(
        241             "Unable to detect model type. Please specify a model type using:\n\n"
        242             "  AutoModelForCausalLM.from_pretrained(..., model_type='...')\n\n"
        243         )
        244     model_type = "gguf"
    --> 246 self._lib = load_library(lib, gpu=config.gpu_layers > 0)
        247 self._llm = self._lib.ctransformers_llm_create(
        248     model_path.encode(),
        249     model_type.encode(),
        250     config.to_struct(),
        251 )
        252 if self._llm is None:
    
    File ~/anaconda3/envs/llm_gguf/lib/python3.10/site-packages/ctransformers/llm.py:126, in load_library(path, gpu)
        124 if "cuda" in path:
        125     load_cuda()
    --> 126 lib = CDLL(path)
        128 lib.ctransformers_llm_create.argtypes = [
        129     c_char_p,  # model_path
        130     c_char_p,  # model_type
        131     ConfigStruct,  # config
        132 ]
        133 lib.ctransformers_llm_create.restype = llm_p
    
    File ~/anaconda3/envs/llm_gguf/lib/python3.10/ctypes/__init__.py:374, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
        371 self._FuncPtr = _FuncPtr
        373 if handle is None:
    --> 374     self._handle = _dlopen(self._name, mode)
        375 else:
        376     self._handle = handle
    
    OSError: libcudart.so.12: cannot open shared object file: No such file or directory

ran command below in my ubuntu 18.04 LTS server

    nvidia-smi


got output below:

    Tue Nov  7 17:55:41 2023       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  Off  | 00000000:42:00.0 Off |                  N/A |
    |  0%   33C    P8    10W / 260W |   1908MiB /  7974MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A     42819      C   ...3/envs/new_llm/bin/python     1905MiB |
    +-----------------------------------------------------------------------------+


we’re you able to figure this out? I’m running into the same issue