91.7% on MMLU with Llama 3.1 405B AWQ 4-bit

We achieved 91.7% accuracy on the MMLU benchmark (100k questions) using a simple two-stage zero-shot prompting strategy we call TTR (Think Then Respond):

Implementation is straightforward - just two prompts (with the final prompt including the generated thoughts):

thoughtPrompt = "How should you best think about this? Explain your thought process step by step." 

outputPrompt = "Output only a single digit representing your choice (with no additional commentary)"

This exceeds more complex approaches like DeepSeek R1’s 90.8%, which requires 64 sampling attempts per question (for pass@1).

We used the hugging-quants 4-bit quantized version of Meta Llama 3.1 405B with VLLM for inference.

Open source: https://github.com/the-othernet/ttr-prompting

1 Like