Editing Information bottleneck method (section)

== Learning theory for deep learning ==
It has been mathematically proven that controlling information bottleneck is one way to control [[generalization error]] in deep learning.<ref>Kenji Kawaguchi, Zhun Deng, Xu Ji, Jiaoyang Huang.[https://proceedings.mlr.press/v202/kawaguchi23a.html "How Does Information Bottleneck Help Deep Learning?"]  Proceedings of the 40th International Conference on Machine Learning, PMLR 202:16049-16096, 2023.</ref> Namely, the generalization error is proven to scale as <math>\tilde O\left(\sqrt{\frac{I(X,T)+1}{n}}\right)</math> where <math>n</math> is the number of training samples, <math>X</math> is the input to a deep neural network, and <math>T</math> is the output of a hidden layer. This generalization bound scale with the degree of information bottleneck, unlike the other generalization bounds that scale with the number of parameters, [[VC dimension]], [[Rademacher complexity]], stability or robustness.