Hi everyone,
I’ve released sparse transcoders for Qwen2.5-VL-7B-Instruct on HuggingFace.
KokosDev/qwen2p5vl-7b-plt · Hugging Face
Technical specs:
- 28 transcoders (one per decoder layer)
- 8,192 features per layer
- Cross-layer predictive architecture (PLT)
- ~10% L0 sparsity
- Apache 2.0 licensed
Use cases:
- Mechanistic interpretability
- Feature discovery
- Circuit analysis
- Model steering research
- Feature suppression or amplification
All code and documentation included in the repo.
Next: Training the 32B version (64 layers, 12K features)!
Questions and feedback welcome.