CUDA convert GUFF to CUDA GUFF

Now I use llama.cpp with cuda.
And I load Meta-Llama-3.1-8B-Instruct-Q6_K.gguf from hugging but it was not for gpu.
So I have to rebuild it for gpu. How can I do it?

1 Like

I think GGUF itself can be loaded onto the GPU without any problems. It is possible that Llamacpp has been built without GPU support. Llamacpp is a software that is difficult to build properly with GPU support…
It is safer to use the pre-built version.

If there is something wrong with GGUF itself, it is quicker to download it again.

I build llama.cpp with cuda enabled and load bartowski/Meta-Llama-3.1-8B-Instruct-GGUF](bartowski/Meta-Llama-3.1-8B-Instruct-GGUF · Hugging Face.
so I lanuch server and it works. But it doesn’t use GPU.

1 Like

In that case, it is possible that there is not enough VRAM, or that parameters such as n_gpu_layers, n_ctx are not set appropriately. For information on settings, please refer to the following. If you try using a very small GGUF, you should be able to tell whether the problem is with VRAM or not.
Also, even if you build with CUDA specified, there are many cases where CUDA is not actually enabled.

yah I do it. And I use llama-server. I think it requires additional options for cuda. But I don’t know exactly!

1 Like

I’ve never used it in server mode, but it seems that you can specify options using the method below. I think the effect is the same.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.