[Research & Compute Request] Self-Supervised Mathematical Reasoning in LLMs

bird-of-paradise · January 9, 2025, 5:19pm

Project Overview

Developing a novel approach to improve LLMs’ mathematical reasoning through self-supervision, building on MATH-SHEPHERD’s success with Process Reward Models (PRM). The method aims to internalize verification capabilities within the model while maintaining training stability through curriculum learning.

This method essentially trades large amounts of training data for compute time, shifting the focus from raw data quantity to model understanding and reasoning ability. As a result, this approach has the potential to generalize well to novel problems, enabling LLMs to solve challenges they may not have seen before—mimicking a true form of intelligence.

Key innovations:

Internal Process Reward Model using reinforcement learning
Stable training through hierarchical curriculum framework
Novel completion-based verification mechanism
PPO-based optimization for mathematical reasoning

Technical Approach

Hierarchical learning framework for progressive capability building
Multiple completion mechanism for robust verification
Reward function based on completion success rates
Automated mastery verification system

Prototype Goals

Implement base verification mechanism
Test curriculum learning structure
Focus on algebra-level problems as proof of concept

Resource Requirements

Single GPU (A100/V100)
4-6 weeks development time
~100GB storage
Initial prototype phase

Background

PhD in Applied Mathematics with focus on stochastic processes and optimization. Experience in complex system modeling and market dynamics.

Seeking

GPU access for initial prototyping
Technical feedback on implementation approach
Collaboration opportunities

Detailed technical proposal and implementation plan available upon request.

Topic		Replies	Views
📢 13 Critical Questions About LLMs – Seeking Insight and Collaboration Beginners	4	87	May 31, 2025
From Crypto Mining to LLM Fine-tuning: Unlocking Large Language Model Fine-tuning through Collaborative Compute Pools Research	4	2016	June 29, 2025
Seeking arXiv Endorsement for cs.AI Submission — Motivation Structure in Human-like AI Beginners	1	21	June 30, 2025
🚧 ReTool: PyTorch Implementation of Strategic Tool Use in LLMs (Seeking Collaborators) Research	0	30	June 1, 2025
Recursion in LLM's Models	4	285	December 9, 2024