Interactive Circuit Tracing - Building on Anthropics Circuit Tracer

GitHub

Preprint

Building on Anthropic’s Circuit Tracer, Neuronpedia, and Circuit Tracing (Lindsey et al., 2025), we extend the paradigm to enable recursive self-interpretation, where models continuously monitor, trace, and explain their own decision processes, presented as interactive artifacts hosted on each frontier AI’s system.

1. Core Recursive Attribution Architecture

The framework below establishes a systematic approach to making Claude and other frontier AI’s internal processes more transparent and analyzable for Anthropic’s circuit tracing research.

framework:
  name: "recursive_attribution_framework"
  version: "1.0.0"
  alignment: "circuit_tracing_research"
  
  core_principles:
    - "Expose computational pathways through structured attribution"
    - "Enable feature intervention for causal confirmation"
    - "Provide multi-level analysis from tokens to concepts"
    - "Support cross-model and cross-language comparison"
    - "Make reasoning faithfulness empirically verifiable"

Claude

Self-Attribution Circuit Trace Analysis

Multi-Step Reasoning Circuit Trace

Neural Circuit Trace Visualization

ChatGPT

Qwen



DeepSeek


Gemini, Grok (In development)