Hello everyone, are there any best practices for using an LLM with the llama.cpp server? I mean specific parameters that should be used when loading the model, regardless of its size. I’m trying to use TheBloke/Mixtral-8x7B-v0.1-GGUF, but it’s quite large and sometimes it doesn’t provide answers at all. The Ollama Server, which also offers the ability to use models from the Ollama website, does a really good job of loading the model automatically. Do you have any ideas on how to optimize the llama.cpp server for using TheBloke/Mixtral-8x7B-v0.1-GGUF? Any advice would be appreciated. Thank you!
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Lama 3.23b performs great when I download and use using ollama but when I manually download the model or if I use the gguf model by unsloth, it gives me irrelevant response. Please help me out | 9 | 1172 | October 31, 2024 | |
Need Suggestions for LLM Models Suitable for 250GB RAM Server | 0 | 132 | December 29, 2024 | |
How to convert LlavaLlamaForCausalLM based models to GGUF | 0 | 202 | July 16, 2024 | |
Issue in Model Loading | 0 | 55 | September 4, 2024 | |
Failed to create LLM 'llama' from .GGUF | 0 | 177 | December 25, 2024 |