Help Needed: Installing Llama 2 70B, Llama 3 70B & LLaMA 2 30B (FP16) on Windows Locally

kirushake · June 3, 2024, 8:30am

Hi everyone,

I’m trying to install Llama 2 70B, Llama 3 70B, and LLaMA 2 30B (FP16) on my Windows gaming rig locally that has dual RTX 4090 GPUs. I aim to access and run these models from the terminal offline. I’ve hit a few roadblocks and could really use some help.

Here are the specifics of my setup:

Windows 10
Dual MSI RTX 4090 Suprim Liquid X 24GB GPUs
Intel Core i9 14900K 14th Gen Desktop Processor
64GB DDR5 RAM
2x Samsung 990 Pro 2TB Gen4 NVMe SSD
Has anyone successfully installed and run these models in a similar setup? If so, could you provide detailed steps or point me to relevant resources? Any tips on optimizing the installation for dual GPUs would be greatly appreciated as well.

Thanks in advance for your assistance!

nielsr · June 3, 2024, 1:09pm

Hi,

First I’d create a Python virtual environment in which you install PyTorch, and verify the following works:

import torch

print(torch.cuda.is_available())
print(torch.cuda.device_count())

This should print True and indicate that you have 2 GPU devices available. This requires CUDA to be installed on your computer (CUDA is the software which runs the hardware).

Next, you can install the Transformers library as explained here and perform inference with LLaMa models as shown in the docs.

Topic		Replies	Views
Nvidia P40 and LLama 2 Beginners	0	2320	August 15, 2023
Meta-Llama-3.1-70B-Instruct-IMat-GGUF Beginners	0	144	July 24, 2024
Any good code/tutorial that is shows how to do inference with Llama 2 70b on multiple GPUs with accelerate? 🤗Accelerate	1	2765	November 27, 2023
How to get Llama-2-13b-chat-hf to ACTUALLY RUN Beginners	0	253	May 30, 2024
Does anyone have an idea how we can run llama2 with multiple GPUs? 🤗Transformers	1	1275	October 26, 2023

Help Needed: Installing Llama 2 70B, Llama 3 70B & LLaMA 2 30B (FP16) on Windows Locally

Related topics