GPU Google Colab not working with langchain

Christian0009 · February 16, 2024, 2:57pm

I am trying to peform Retrieval Augmented Generation with langchain and models on HF. It works but it is very slow as GPU seems not to work.

I checked the following:

GPU on Google Colab is on (I chose T4)
Some links ask to specify device or n_gpu_layers as arguments, but it doesn’t work (see code below). - Specifically for the device argument, a numerical id was mentioned, I tried 0 (as cuda:0 is retrieved)
the cuda version of ctransformers is installed through !pip install ctransformers[cuda]
I also tried the “model.to(device)” but it seems not to be possible with my configuration.

Here’s the code but I also tried variations as mentioned above:

#Test Llama2
llm = CTransformers(device="cuda:0",n_gpu_layers = 110,
n_batch = 512,model="TheBloke/Llama-2-7B-Chat-GGUF", model_type="llama",config={"context_lengrecth" :

4096,"max_new_tokens": 1024})

rag_pipeline = RetrievalQA.from_chain_type(
    llm=llm, chain_type='stuff',
    retriever=retriever
)

import time
start_time = time.time()
output=rag_pipeline.invoke("Is a license compulsory for triathlon ?")
print("--- %s seconds ---" % (time.time() - start_time))

print(output)

Topic		Replies	Views
Running Llama model in Google colab Models	5	906	September 5, 2024
Using GPU with transformers Beginners	4	11660	November 3, 2020
Need help performance issues transformers.AutoModelForCausalLM.from_pretrained( 'mosaicml/mpt-7b-instruct' Beginners	0	934	June 12, 2023
Pipeline not using GPU Beginners	0	1533	February 26, 2024
How to run the Causal Language modelling example on multiple gpu? 🤗Transformers	0	81	September 16, 2024

GPU Google Colab not working with langchain

Related topics