Utilizing GPU for ggml model with Langchain

carpenterjrds · August 4, 2023, 2:02am

I’m currently using a ggml-format model (13b-chimera.ggmlv3.q4_1.bin) in an app using Langchain. I’ve found that the program is still only using the CPU, despite running it on a VM with a GPU.

I’ve tried using the line torch.cuda.set_device(torch.device("cuda:0")) which returns the error AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'. But running the line torch.cuda.is_available() returns true, indicating that CUDA can find the GPU.

Can ggml models work with GPUs in the first place? Do I need to use another format like GPTQ? If so, how would I implement it in the program, since ggml models are a single .bin file while GPTQ models seem to be a collection of files?

Wi1 · August 19, 2023, 1:17am

Can check out the GPU instrucitons: https://python.langchain.com/docs/integrations/llms/llamacpp#gpu

Topic		Replies	Views
GPU Google Colab not working with langchain Models	0	541	February 16, 2024
cuBLAS error 13 when running code with langchain.llms on GPU 🤗Accelerate	0	268	May 6, 2024
Run pre-trained LLM model on CPU - ValueError: Expected a cuda device, but got: cpu Beginners	0	334	April 17, 2024
NLP Pretrained model model doesn’t use GPU when making inference 🤗Transformers	11	10126	March 11, 2022
Need help performance issues transformers.AutoModelForCausalLM.from_pretrained( 'mosaicml/mpt-7b-instruct' Beginners	0	930	June 12, 2023

Utilizing GPU for ggml model with Langchain

Related topics