Prompt Theory: A Framework

Well that complements SUPS and the broader Constitutional Philosophical Self-Play process I am working on nicely.

What I am trying to do is use self-play with careful sheparding to bootstrap a simple ethics and reasoning system into a complex one by reflection, working on a hypothesized ‘goldilocks’ zone where the problems are simple enough for the model to solve, but hard enough to learn from.

And all that is holy, it exists. It is when 0.65< Te/Tc < 0.85. Fantastic - models can in fact bootstrap their cognition and alignment by self-play and generation of synthetic training data if you can keep it in that range.

I just have to finish SUPS and then hook up the main algorithm, and I can likely build more robust formal feedback mechanisms now. Thank you very much for sharing, this greatly boosts my confidence it will work.

1 Like