I am a physician trying to learn LLM technology to solver some of the problems in oncology. I have a remote history of Computer and Electrical Engineering (25+ years ago). I am try to find a Local LLM that can solve some hard problems as way to select a very strong math reasoning model for my research. Here are a few problems (from Putnum):
(1) For each positive integer k, let A(k) be the number of
odd divisors of k in the interval [1, √(2k) ). Evaluate
∞
∑ [(−1)^(k−1)] * [ A(k)/k ]
k=1
ANSWER: (Pi^2)/16
(2)
Let n be a positive integer. For i and j in {1, 2, . . . , n},
let s(i, j) be the number of pairs (a, b) of nonnegative
integers satisfying ai + b j = n. Let S be the n-by-n
matrix whose (i, j) entry is s(i, j). For example, when
n = 5, we have Matrix S =
6 3 2 2 2
3 0 1 0 1
2 1 0 0 1
2 0 0 0 1
2 1 1 1 2
. Compute the determinant of S.
ANSWER: (−1)^(⌈n/2⌉−1) *2 * ⌈ n/2 ⌉. ⌈⌉ is the ceiling function
(3) Let c_0, c_1, c_2, . . . be the sequence defined so that
(1 − 3x − √(1 − 14x + 9x^2)) / 4 =
∞
∑ c_k * x^k
k=0
for sufficiently small x. For a positive integer n, let A be the n-by-n matrix with i, j-entry c(i+ j−1) for i and j in {1, . . . , n}. Find the determinant of A.
ANSWER: 10^(n(n-1)/2)
(4) For a real number a, let Fa(x) = ∑n≥1 (n^a) * (e^(2n) * x^(n^2) for 0 ≤bx < 1. Find a real number c such that
lim
x→1− [ Fa(x) * e^(−1/(1−x)) ] = 0 for all a < c, and
lim
x→1− [ Fa(x) * e^(−1/(1−x)) ] = ∞ for all a > c.
ANSWER: -1/2
I have some success with Qwen3 32B every few runs on some of there, but gets then wrong most of the time. The first one (1) seems to be the hardest and none Itried solves it. I have 64GB Ram and 4090 16GB Laptop. But I am thinking of upgrading to a bigger computer. Please, Let me know what are the strongest reasoning models you have worked with - those will probably have the highest chance of solving these. Each run is 20min to 3 hours for the problems. So, I have been at it after work for 2 weeks and not able to try too many models. Qwen3 seems to be strongest so far. But there are also Qwen3 44B and 48B… Not sure if these are stronger. Too many to try. If there are other models, please let me know. Note, most full models Deepseek, ChatGPT, etc do better but many times get it wrong, esp problem (1).