Hi All,
I have a server with 250GB of RAM but no GPU. I’ve attempted to run some quantized LLAMA3 models, such as:
- unsloth: Llama-3.3-70B-Instruct-Q5_K_M.gguf, Llama-3.3-70B-Instruct-Q3_K_M.gguf
However, I’ve been unable to load them due to RAM limitations.
I’m looking for an LLM that can run within my 250GB RAM setup. My tasks involve basic question-answering, such as analyzing a call transcript (in JSON format) between an agent and a customer to determine:
- Whether the agent introduced themselves,
- Whether the agent resolved the issue, or
- If the issue was escalated to a supervisor.
Could you please suggest any suitable models for these requirements?
Thanks in advance!