Towards White Box Deep Learning

Hi, I’d like to share my paper that proposes a novel approach for building white-box neural networks.

The paper introduces semantic features as a general technique for controlled dimensionality reduction, somewhat reminiscent of Hinton’s capsules and the idea of “inverse rendering”. In short, semantic features aim to capture the core characteristic of any semantic entity - having many possible states but being at exactly one state at a time. This results in regularization that is strong enough to make the PoC neural network inherently interpretable and also robust to adversarial attacks - despite no form of adversarial training! The paper may be viewed as a manifesto for a novel white-box approach to deep learning.

As an independent researcher I’d be grateful for your feedback!