An interesting project. Just looking for advice

I am working on an AI server build, honestly using AI (ChatGPT) to both guide me and teach me how to build an AI server. A friend of mine invested in a pretty extravagant system. (Specs below) and here’s where im kind of at with it. Ive been working on it for about 3 months now in my spare time and its a bit difficult but i did get Mistral-7B-Instruct-v0.3 loaded, ive loaded a few models and want to make the most of my hardware (this is all experimental for me) but i really want to push what i have to the absolute limitations. i have been trying to get Mistral-7B-Instruct-v0.3 to run but it just wont start for me. Any suggestions to make the most of my hardware. Specs as follow:

:white_check_mark: CPU: AMD EPYC Genoa 96C/192T (DDR5 ECC, ~3.7 GHz boost)
:white_check_mark: Motherboard: ASRock Rack GENOAD8X-2T/BCM (PCIe 5.0, BMC/IPMI)
:white_check_mark: Memory: 512 GB DDR5 ECC Registered (8 × 64 GB SK Hynix 4800 MT/s)
:white_check_mark: GPUs: NVIDIA A100-SXM4-40GB + NVIDIA Quadro RTX 8000
:white_check_mark: System Storage: 2 × Samsung 990 EVO 4TB NVMe (ZFS rpool, bpool)
:white_check_mark: Data Storage: 8 × Intel P4510 4TB NVMe on HighPoint SSD7580B RAID (ZFS aipool ~29 TB)

1 Like

I think the specs are more than enough…
First, try running the sample code for Colab on the page below.

If speed is important, I recommend using Ollama (for short sentences) or TGI or vLLM (for fast inference including long sentences). These will make better use of your hardware performance.