Need Suggestions for LLM Models Suitable for 250GB RAM Server

Hi All,

I have a server with 250GB of RAM but no GPU. I’ve attempted to run some quantized LLAMA3 models, such as:

  • unsloth: Llama-3.3-70B-Instruct-Q5_K_M.gguf, Llama-3.3-70B-Instruct-Q3_K_M.gguf

However, I’ve been unable to load them due to RAM limitations.

I’m looking for an LLM that can run within my 250GB RAM setup. My tasks involve basic question-answering, such as analyzing a call transcript (in JSON format) between an agent and a customer to determine:

  • Whether the agent introduced themselves,
  • Whether the agent resolved the issue, or
  • If the issue was escalated to a supervisor.

Could you please suggest any suitable models for these requirements?

Thanks in advance!

1 Like