cuBLAS error 13 when running code with langchain.llms on GPU

Hello everyone,

I’m encountering an issue while running my code on Jupyter Notebook with langchain.llms. I’m using 2 GPU (NVIDIA GeForce RTX 2060) for computations, and I keep getting the error “cuBLAS error 13 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:5629: the function failed to launch on the GPU” when trying to generate text using the CTransformers model.

Here’s a snippet of my code:

from langchain.llms import CTransformers
from huggingface_hub import hf_hub_download
from accelerate import Accelerator

model_repo = "TheBloke/Llama-2-7B-GGUF"
model_filename = "llama-2-7b.Q4_K_M.gguf"

# This downloads the model file to a local path
model_file_path = hf_hub_download(repo_id=model_repo, filename=model_filename)

accelerator = Accelerator()

# Configuration for text generation
config = {
    'max_new_tokens': 5000,
    'temperature': 0.1,
    'repetition_penalty': 1.1,
    'context_length': 8192,
    'stream': True,
    'gpu_layers': 20
}

# Initialize CTransformers with model file path and configuration
llm = CTransformers(model=model_file_path, model_type="llama", config=config, device_map="auto")

llm, config = accelerator.prepare(llm, config)

response = llm.invoke("What is machine learning?")
print(response)

I’ve verified that I have the correct versions of CUDA (12.4) and nvcc. I’ve also tried uninstalling and reinstalling ctransformers[cuda], but I’m still encountering the error.

Additionally, I receive the following message before the error occurs: “ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 2060) as main device”.

I’m hoping to get some guidance on how to resolve this issue and successfully run my code on the GPU. Any help or suggestions would be greatly appreciated.

Thank you in advance!