A Central Limit Theorem for Deep Neural Networks and Products of Random Matrices
We study the output-input Jacobian matrix for deep ReLU neural networks when they are initialized with random weights. We reduce the problem to studying certain products of random matrices and show that the norm of columns of this matrix are approximately log-normal distributed. The result holds for a large class of random weights. The variance depends on the depth-to-width aspect ratio of the network; this result provides an explanation for why very deep networks can suffer from the "vanishing and exploding " gradient problem that makes these networks difficult to train. Based on joint work with Boris Hanin.