Load model efficiently using llama.cpp

Hello everyone, are there any best practices for using an LLM with the llama.cpp server? I mean specific parameters that should be used when loading the model, regardless of its size. I’m trying to use TheBloke/Mixtral-8x7B-v0.1-GGUF, but it’s quite large and sometimes it doesn’t provide answers at all. The Ollama Server, which also offers the ability to use models from the Ollama website, does a really good job of loading the model automatically. Do you have any ideas on how to optimize the llama.cpp server for using TheBloke/Mixtral-8x7B-v0.1-GGUF? Any advice would be appreciated. Thank you!