Local RAG setup for lawyers using Mistral & LangChain – feasibility & hardware feedback?

Hi everyone,
I’m currently evaluating whether a local AI setup on a gaming PC could be useful for a small law firm. The idea is:

  1. Store all legal documents related to a specific case category (e.g. “defects in car sales”) in a RAG database
  2. Upload case-specific documents and correspondence to a chat interface
  3. Use prompts to generate summaries, key data points, and maybe even draft letters or pleadings

Copilot or ChatGPT is not an option here because all documents and correspondence would need to be anonymized beforehand — and no one realistically does that. A local approach keeps all data inside the firm, which is much more acceptable from a privacy standpoint.

My idea:

  • Local AI PC (gaming hardware) for confidential legal document processing using a RAG setup
  • Access through local web frontend with authorization
    
  • No cloud access → fully local, data privacy is critical
    
  • Planned setup: Linux + Mistral / Mixtral + llama.cpp + LangChain + OpenGPTs — does that make sense?
    
  • Hugging Face as source for models, datasets, and potential transformer experiments
    
  • RAG: Indexing own documents & legal texts (~10,000 PDFs, rulings, notes, correspondence)
    

My question to you all:
Could this setup realistically reduce workload in a productive environment, or is it still just a toy/ research tool?
(My take: ChatGPT/Copilot could save lawyers hours already — if they could upload everything without concern. But that’s not legal in my opinion, due to privacy laws / GDRP. Maybe a category-specific local RAG setup might even outperform current cloud LLMs in this narrow domain???)

About the hardware:

Here’s the machine I’ve configured — what do you think of it in terms of suitability and price/performance?

1 × Be Quiet! PURE BASE 501 - Airflow White BG075 +49,00 €
1 × Intel Core i5-13600KF - 6x 3.50GHz + 8x 2.60GHz (up to 5.1GHz)
1 × Polartherm by Thermal Grizzly
1 × Arctic Freezer 36 Black CPU Cooler -35,00 €
1 × ASUS TUF B760-Plus WIFI | Intel B760 DDR4 - LGA1700
1 × 32GB DDR4 3600MHz Corsair Vengeance RGB PRO SL CL18 +24,00 €
1 × Nvidia GeForce RTX 5070 TI 16GB GDDR7 with DLSS 4
1 × 850W MSI MAG A850GL PCIE5 80+ Gold
1 × 1TB M.2 SSD (NVMe) MSI Spatium M470 Pro PCIe 4.0
1 × Windows 11 64Bit Pro
1 × 3-year warranty & 30-day return (included)

Total: €1,772.90

1 Like

Could this setup realistically reduce workload in a productive environment, or is it still just a toy/ research tool?

I think this is something you have to try in practice to judge…

Database searches should work normally with RAG if there are no bugs, but coming up with the answer based on that is the job of the LLM. So it depends on the LLM’s reasoning ability, but with the VRAM size of GPUs available to the general public, you can’t use very large models. With 16GB, you’d probably end up using a model around 16B or less even in the GGUF format of Llama.cpp.In many cases, this may be sufficient if multilingual or multitasking capabilities aren’t required, but the results will vary significantly depending on each model’s strengths. The Mistral Small 24B usually performs quite well. However, I’m not sure if it will meet your standards…

Here’s the machine I’ve configured — what do you think of it in terms of suitability and price/performance?

1 × Intel Core i5-13600KF - 6x 3.50GHz + 8x 2.60GHz (up to 5.1GHz)

Even if the CPU is weak, it’s not a big deal, and Ryzen might be better in terms of wattage performance and core count. Well, it’s not a big deal. Also, for AI applications, CPUs don’t get as hot as GPUs, so retail fans are fine.

1 × 32GB DDR4 3600MHz Corsair Vengeance RGB PRO SL CL18 +24,00 €

If possible, 64GB or more is recommended. During model loading and inference, it is generally safer to have RAM in addition to VRAM.

1 × Nvidia GeForce RTX 5070 TI 16GB GDDR7 with DLSS 4

In short, the GPU generation, VRAM size, and clock speed are the most important factors, and everything else is secondary.

The 50x0 series performs well, but many libraries are incompatible or lack sufficient support because the CUDA Toolkit and PyTorch versions are locked to versions close to the latest. If it’s not for research purposes, I recommend getting a used 4090 or 4080.
The detail situation varies by country and region, so it’s best to ask on Discord for details.

As for the overall price… well, considering the GPU, it’s about right. GPUs are expensive everywhere.