cuBLAS error 13 when running code with langchain.llms on GPU

YoussL · May 6, 2024, 2:53pm

Hello everyone,

I’m encountering an issue while running my code on Jupyter Notebook with langchain.llms. I’m using 2 GPU (NVIDIA GeForce RTX 2060) for computations, and I keep getting the error “cuBLAS error 13 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:5629: the function failed to launch on the GPU” when trying to generate text using the CTransformers model.

Here’s a snippet of my code:

from langchain.llms import CTransformers
from huggingface_hub import hf_hub_download
from accelerate import Accelerator

model_repo = "TheBloke/Llama-2-7B-GGUF"
model_filename = "llama-2-7b.Q4_K_M.gguf"

# This downloads the model file to a local path
model_file_path = hf_hub_download(repo_id=model_repo, filename=model_filename)

accelerator = Accelerator()

# Configuration for text generation
config = {
    'max_new_tokens': 5000,
    'temperature': 0.1,
    'repetition_penalty': 1.1,
    'context_length': 8192,
    'stream': True,
    'gpu_layers': 20
}

# Initialize CTransformers with model file path and configuration
llm = CTransformers(model=model_file_path, model_type="llama", config=config, device_map="auto")

llm, config = accelerator.prepare(llm, config)

response = llm.invoke("What is machine learning?")
print(response)

I’ve verified that I have the correct versions of CUDA (12.4) and nvcc. I’ve also tried uninstalling and reinstalling ctransformers[cuda], but I’m still encountering the error.

Additionally, I receive the following message before the error occurs: “ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 2060) as main device”.

I’m hoping to get some guidance on how to resolve this issue and successfully run my code on the GPU. Any help or suggestions would be greatly appreciated.

Thank you in advance!

Topic		Replies	Views
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP) 🤗Transformers	11	3513	October 1, 2024
GPU Google Colab not working with langchain Models	0	541	February 16, 2024
Running ctransformers with cuda 11.4 or lower 🤗Transformers	1	2967	June 7, 2024
Runtime Error: Cuda Initialization 🤗Transformers	13	209	March 24, 2025
RoBERTa fine-tuning, CUBLAS_STATUS_NOT_SUPPORTED Beginners	0	975	December 20, 2022

cuBLAS error 13 when running code with langchain.llms on GPU

Related topics