We achieved 91.7% accuracy on the MMLU benchmark (100k questions) using a simple two-stage zero-shot prompting strategy we call TTR (Think Then Respond):
Implementation is straightforward - just two prompts (with the final prompt including the generated thoughts):
thoughtPrompt = "How should you best think about this? Explain your thought process step by step."
outputPrompt = "Output only a single digit representing your choice (with no additional commentary)"
This exceeds more complex approaches like DeepSeek R1’s 90.8%, which requires 64 sampling attempts per question (for pass@1).
We used the hugging-quants 4-bit quantized version of Meta Llama 3.1 405B with VLLM for inference.
Open source: https://github.com/the-othernet/ttr-prompting