Help Needed: Installing Llama 2 70B, Llama 3 70B & LLaMA 2 30B (FP16) on Windows Locally

Hi everyone,

I’m trying to install Llama 2 70B, Llama 3 70B, and LLaMA 2 30B (FP16) on my Windows gaming rig locally that has dual RTX 4090 GPUs. I aim to access and run these models from the terminal offline. I’ve hit a few roadblocks and could really use some help.

Here are the specifics of my setup:

Windows 10
Dual MSI RTX 4090 Suprim Liquid X 24GB GPUs
Intel Core i9 14900K 14th Gen Desktop Processor
64GB DDR5 RAM
2x Samsung 990 Pro 2TB Gen4 NVMe SSD
Has anyone successfully installed and run these models in a similar setup? If so, could you provide detailed steps or point me to relevant resources? Any tips on optimizing the installation for dual GPUs would be greatly appreciated as well.

Thanks in advance for your assistance!

Hi,

First I’d create a Python virtual environment in which you install PyTorch, and verify the following works:

import torch

print(torch.cuda.is_available())
print(torch.cuda.device_count())

This should print True and indicate that you have 2 GPU devices available. This requires CUDA to be installed on your computer (CUDA is the software which runs the hardware).

Next, you can install the Transformers library as explained here and perform inference with LLaMa models as shown in the docs.