I also have a Gigabyte 4080 that’s sitting around collecting dust in another computer. I want to add the 4080 to my current rig to increase VRAM (16GB 4080 + 24 GB 4090 = 40GB) and hopefully token/second speeds.
The 4090 sits in the PCI_E1: PCIe 5.0 x16 (from CPU) slot on the MSI X670E MAG motherboard. It’s so fat that it takes up three slots. The only other available slot to mount the 4080 is the PCI_E4: PCIe 4.0 x2 (from X670 chipset).
My main Question is this: Will using the open PCIe 4.0 x2 (from X670 chipset) cause the 4080 to underperform? Or does it not matter as I’m only using it for the VRAM?
Side question:
I also have a 3080 that’s supposed to go on eBay…and now I’m wondering if it’s possible to add that into the mix as well (without complicated coding/etc being required to make it work).
The only thing that I can think of is:
1) Buy a GPU mining rack to mount the 4090, 4080, and 3080. Then buy a riser cable for all three cards. I already have an 850W Corsair PSU to power the 4080 (and hopefully it will allow me to plug the 3080 in there as well).
2) Use the following slots for each card:
4090: PCI_E1: PCIe 5.0 x16 (from CPU)
4080: PCI_E4: PCIe 4.0 x2 (from X670 chipset)
3080: PCI_E3 supports up to PCIe 4.0 x4
Am wondering if the PCI_E4 and PCI_E3 ports will cause the 4080 and 3080 to run at slow speeds? Or is that not how that works?
I’ve had some experience with configuring rigs for AI, and I can understand your concerns about maximizing performance. Regarding your main question, I suppose using the open PCIe 4.0 x2 slot for the 4080 might limit its potential, especially if bandwidth is a factor for your AI tasks. However, if you’re primarily concerned about increasing VRAM, it may still serve your purposes adequately.
As for your side question, I think your plan to use a GPU mining rack and riser cables sounds feasible. However, I suggest double-checking compatibility and power requirements to ensure smooth operation. Regarding the PCIe slots, I think the 4080 and 3080 might experience some performance limitations if they’re not utilizing optimal bandwidth, but it ultimately depends on the specific tasks you’re running.
As far as I know, video cards work in full compatibility IF they are two identical - for example, two 4090 with the same amount of memory. If you connect two different video cards, the performance will be limited by the maximum performance of the weakest of them
Thank you. My main goal is to try and run LLama3 70B Instruct (no quants) locally. I know I can use HuggingFace Chat…but I don’t like sharing my data. Unsure if that is even possible with this low amount of VRAM that I have.
I use AI for writing purposes (content marketing and copywriting). So, in addition to a “decent” speed (anything better than 1-2 tokens per second for Q5 quant of Llama3 70B Instruct), I’m also looking for an intelligent model that can help me reason through things like value props, audiences, etc.
Do you think I will run into performance limitations based on my use case? I have no desire to fine-tune models on this rig, only to run as high of a model as possible. When I decide to fine-tune one day, I will rent some GPUs in the cloud to handle that task.