I am building a server with 124 gb vram of 5 k80 and 1 quadro k500. Running off 2 Xeon e5-2690v4 cpu with 192 GB ram server with Proxmox installed for the use case of running 2 to 3 homes with home assistant, running an offline mobile llm, and running local coding assistant. Would this suffice for my use case? What models would you recommend?
It seems that even fairly small models can be suitable for certain applications. With those specifications, it should be possible to run multiple 32B models simultaneously… Is that overkill?
Larger models tend to become more versatile, so I think that Instruct models (models designed for chatbots) of 7B or larger should be suitable for most applications…