Zero Initialization in Deep Learning

Textbooks often state that zero initialization prevents learning, typically under symmetry assumptions.

However, we observe that learning can still proceed even when all weights and biases are initialized to zero, provided certain conditions are met.

https://www.researchsquare.com/article/rs-4890533/v3

This paper explores both partial and full zero initialization. While full zero initialization appears to conflict with common textbook statements, we identify settings in which learning remains possible.

For example, a model with two million parameters, all initialized to zero, can still be trained successfully under specific conditions.

These observations suggest that the textbook claim may be more conditional than it is often presented.

1 Like