i want to buy/build a dedicated machine for local LLM usage. My priority lies on quality and not speed, so i’ve looked into machines with the capability for lots of “unified memory”, rather than GPU systems with dedicated fast but small VRAM. My budget would be “the cheaper the better”. I’ve looked at the “Nvidia - DGX Spark” but i must say for “only” getting 128 GB LPDDR5x of unified memory the price is too high in my mind.
The first thing I should say is, let’s go to HF Discord. There are people from multiple countries and regions who are knowledgeable about generative AI, so it’s perfect for hardware, especially GPU and CPU purchase consultations.
Now, if you can limit the use to LLM, the latest AMD isn’t bad, but CUDA is also attractive…
The proprietary OS seems fast, but I’m worried about the breadth of use cases.
Or rather, if you go to Discord, it’s really reliable. If you’re buying something expensive, it’s often cheaper to ask someone who already has it. I haven’t bought a new environment for AI yet… I’m just watching from the side.
Good luck!
I have a similar need, because I can’t use computing-clouds for tests with sensitive data from users.
And as far as I am right now with figuring out the amazing HF universe, it seems everything is uploaded/processed in the cloud.
For e.g. FastRTC seems coupled with HF, if a temporary telephone number can be acquired. To me it’s often not clear, where the data gets processed.
My interests are in Speech Synthesis for making apps assessable by voice.
Analyzing documents (PDF) for classification and summarization to build a sorting app.
But its hard for me to figure out what’s still possible on my local computer or if I can build such AI driven applications. Later wouldn’t be normally able to run on “ordinary” hardware, right?
I’m not very experienced, but I think it would be possible to achieve this with a GPU for private use if you connect an ASR model like Whisper that works locally with a tool that converts PDFs to markdown or text.
If you want to use FastRTC for confidential documents, you can set up your own server, but if you want to do it completely locally, it would probably be easier to use a local model such as Whisper from the start. These will be fine as long as you have 2GB of VRAM.
I think you can use any LLM or LLM to summarize. The appropriate model will depend on the purpose and the complexity of the documents you are working with, but most LLM can handle this task, even if it is small. LLM requires as much VRAM as you want, but if you just want to summarize, even a model with 5GB of VRAM in quantized state should not normally cause any problems. Well, if you have 12GB of VRAM to spare, you can achieve it. Well, it’s a mid-level GeForce.
For more information, please refer to the Hugging Face Audio Course, which explains how to create the framework for various things.
PDF Converters works also on locally (if models were already downloaded)