In Ollama, there’s an option to create a custom system prompt using a Modelfile, and there’s also the option to use an existing model and pass the instruction as a prompt. Which one will be faster in terms of token processing? Does a system prompt retain the processed tokens between sessions?
It seems to be getting faster.
by Hugging Chat: https://huggingface.co/chat/
In Ollama, using a custom system prompt via a Modelfile can be faster in terms of token processing per request because the prompt is loaded once during model initialization, reducing the need to reprocess it each session. This setup also retains the system prompt between sessions if the model remains in memory. However, the initial setup may take longer, and the retention of processed tokens depends on Ollama’s configuration and whether it’s run as a persistent service.
Answer
- Speed: Using a custom system prompt in the Modelfile is generally faster because the prompt is processed once during model initialization, avoiding the overhead of reprocessing it with each new prompt [1][2].
- Token Retention: The system prompt is retained between sessions as it’s part of the model’s persistent setup. However, Ollama doesn’t automatically retain processed tokens unless configured to maintain context across sessions [2][3][1].
References
[1] The use of a Modelfile reduces the need to reprocess the system prompt each time [1].
[2] Persistent services in Ollama can maintain context, affecting token retention [2].
[3] The Modelfile includes the system prompt, which is loaded once [3].