Nvidia P40 and LLama 2

masterchop · August 15, 2023, 1:23am

Hello, I am trying to get some HW to work with llama 2 the current hardware works fine but its a bit slow and i cant load the full models.
I had to go with quantized versions event though they get a bit slow on the inference time.
I saw that the Nvidia P40 arent that bad in price with a good VRAM 24GB and wondering if i could use 1 or 2 to run LLAMA 2 and increase inference times?
I saw a lot of people saying there are some limitations and others that they are a pain and other that should work just fine , i opened my own question to see if someone can share some light about this topic.

Thanks for any response.

Topic		Replies	Views
Help Needed: Installing Llama 2 70B, Llama 3 70B & LLaMA 2 30B (FP16) on Windows Locally Beginners	1	451	June 3, 2024
Any good code/tutorial that is shows how to do inference with Llama 2 70b on multiple GPUs with accelerate? 🤗Accelerate	1	2770	November 27, 2023
How to get Llama-2-13b-chat-hf to ACTUALLY RUN Beginners	0	253	May 30, 2024
Hardware Requirement GPU Beginners	3	1164	January 27, 2025
Requirements Llama2 Intermediate	0	277	April 13, 2024

Nvidia P40 and LLama 2

Related topics