🚧 ReTool: PyTorch Implementation of Strategic Tool Use in LLMs (Seeking Collaborators)

Hey everyone :waving_hand:

I’ve been working on a research-grade implementation of the ReTool framework (ReTool: Reinforcement Learning for Strategic Tool Use in LLMs), and I’m now sharing an early version of it:

:paperclip: HF Space: ReTool Implementation


:white_check_mark: What’s Implemented

  • Multi-turn tool-augmented generation with <code> and <interpreter> handling
  • KV-cache optimized rollouts
  • Interpreter token masking for loss exclusion
  • Sampling logic adapted from Hugging Face TRL (for PPO-style training)

:test_tube: What’s Still in Progress

  • Safe sandbox integration (mocked for now)
  • End-to-end testing and reward evaluation
  • Dataset cleanup and trainer polish

:woman_raising_hand: Looking for Collaborators

I’d love to team up with people who are interested in:

  • Building the interpreter sandbox (Python execution engine)
  • Improving testing and training scaffolds
  • Tuning or validating on math-heavy datasets

I’m especially interested in collaborators who bring complementary skills (e.g., systems, data, RL eval).

Even if you’re not ready to contribute code, feel free to check it out and drop feedback.

Thanks — hope this can help more of us explore hybrid reasoning in LLMs!

1 Like