Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Information bottleneck method
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Technique in information theory}} The '''information bottleneck method''' is a technique in [[information theory]] introduced by [[Naftali Tishby]], Fernando C. Pereira, and [[William Bialek]].<ref name=":0">{{cite conference |url=http://www.cs.huji.ac.il/labs/learning/Papers/allerton.pdf|title=The Information Bottleneck Method|conference=The 37th annual Allerton Conference on Communication, Control, and Computing|last1=Tishby|first1=Naftali|author-link1=Naftali Tishby|last2=Pereira|first2=Fernando C.|last3=Bialek|first3=William|author-link3=William Bialek|date=September 1999|pages=368–377}}</ref> It is designed for finding the best tradeoff between [[accuracy]] and complexity ([[Data compression|compression]]) when [[random variable|summarizing]] (e.g. [[data clustering|clustering]]) a [[random variable]] '''X''', given a [[joint probability distribution]] '''p(X,Y)''' between '''X''' and an observed relevant variable '''Y''' - and self-described as providing ''"a surprisingly rich framework for discussing a variety of problems in signal processing and learning"''.<ref name=":0"/> Applications include distributional clustering and [[dimension reduction]], and more recently it has been suggested as a theoretical foundation for [[deep learning]]. It generalized the classical notion of minimal [[sufficient statistics]] from [[parametric statistics]] to arbitrary distributions, not necessarily of exponential form. It does so by relaxing the sufficiency condition to capture some fraction of the [[mutual information]] with the relevant variable '''Y'''. These approaches may involve convexified and entropy-regularised formulations using symbolic continuation algorithms to avoid bifurcation-induced instabilities across the β trade-off.<ref>{{Cite arXiv |arxiv=2505.09239 |title=Stable and Convexified Information Bottleneck Optimization via Symbolic Continuation and Entropy-Regularized Trajectories |last=Alpay |first=Faruk |date=2025-05-14}}</ref> The information bottleneck can also be viewed as a [[Rate–distortion theory|rate distortion]] problem, with a distortion function that measures how well '''Y''' is predicted from a compressed representation '''T''' compared to its direct prediction from '''X'''. This interpretation provides a general iterative algorithm for solving the information bottleneck trade-off and calculating the information curve from the distribution '''p(X,Y)'''. Let the compressed representation be given by random variable <math>T</math>. The algorithm minimizes the following functional with respect to conditional distribution <math>p(t|x)</math>: : <math> \inf_{p(t|x)} \,\, \Big( I(X;T) - \beta I(T;Y) \Big),</math> where <math>I(X;T)</math> and <math>I(T;Y)</math> are the mutual information of <math>X</math> and <math>T</math>, and of <math>T</math> and <math>Y</math>, respectively, and <math>\beta</math> is a [[Lagrange multiplier]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)