Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Markov chain Monte Carlo
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Applications == MCMC methods are primarily used for calculating [[Numerical analysis|numerical approximations]] of [[Multiple integral|multi-dimensional integrals]], for example in [[Bayesian statistics]], [[computational physics]],<ref>{{Cite journal|last1=Kasim|first1=M.F.|last2=Bott|first2=A.F.A.|last3=Tzeferacos|first3=P.|last4=Lamb|first4=D.Q.|last5=Gregori|first5=G.|last6=Vinko|first6=S.M. | date = September 2019 |title=Retrieving fields from proton radiography without source profiles |journal=Physical Review E|volume=100|issue=3|page=033208|doi=10.1103/PhysRevE.100.033208|pmid=31639953|arxiv=1905.12934|bibcode=2019PhRvE.100c3208K|s2cid=170078861}}</ref> [[computational biology]]<ref>{{Cite journal|last1=Gupta|first1=Ankur|last2=Rawlings|first2=James B. | date = April 2014 |title=Comparison of Parameter Estimation Methods in Stochastic Chemical Kinetic Models: Examples in Systems Biology |journal=AIChE Journal|volume=60|issue=4|pages=1253β1268|doi=10.1002/aic.14409 |pmc=4946376|pmid=27429455|bibcode=2014AIChE..60.1253G }}</ref> and [[computational linguistics]].<ref>See Gill 2008.</ref><ref>See Robert & Casella 2004.</ref> ===Bayesian Statistics=== In Bayesian statistics, Markov chain Monte Carlo methods are typically used to calculate [[Moment (mathematics)|moments]] and [[credible interval]]s of [[posterior probability]] distributions. The use of MCMC methods makes it possible to compute large [[Bayesian network#Hierarchical models|hierarchical models]] that require integrations over hundreds to thousands of unknown parameters.<ref>{{cite book|last1=Banerjee|first1=Sudipto|last2=Carlin|first2=Bradley P.|last3=Gelfand|first3=Alan P.|title=Hierarchical Modeling and Analysis for Spatial Data|publisher=CRC Press|isbn=978-1-4398-1917-3|page=xix|edition=Second|date=2014-09-12}}</ref> === Statistical Physics === Many contemporary research problems in statistical physics can be addressed by approximate solutions using Monte Carlo simulation, which provides valuable insights into the properties of complex systems. Monte Carlo methods are fundamental in computational physics, physical chemistry, and related disciplines, with broad applications including medical physics, where they are employed to model radiation transport for radiation dosimetry calculations.<ref>{{Cite journal |last1=Jia |first1=Xun |last2=Ziegenhein |first2=Peter |last3=Jiang |first3=Steve B. |date=2014-02-21 |title=GPU-based high-performance computing for radiation therapy |journal=Physics in Medicine and Biology |volume=59 |issue=4 |pages=R151β182 |doi=10.1088/0031-9155/59/4/R151 |issn=1361-6560 |pmc=4003902 |pmid=24486639|bibcode=2014PMB....59R.151J }}</ref><ref>{{Cite journal |last=Rogers |first=D. W. O. |date=July 2006 |title=REVIEW: Fifty years of Monte Carlo simulations for medical physics |url=https://ui.adsabs.harvard.edu/abs/2006PMB....51R.287R/abstract |journal=Physics in Medicine and Biology |language=en |volume=51 |issue=13 |pages=R287βR301 |doi=10.1088/0031-9155/51/13/R17 |pmid=16790908 |bibcode=2006PMB....51R.287R |issn=0031-9155}}</ref> Instead of exhaustively analyzing all possible system states, the Monte Carlo method randomly examines a subset of them to form a representative sample, and yields accurate approximations of the system's characteristic properties. As the number of sampled states increases, the error can be further reduced to a lower level. === Complex Distribution Sampling === [[File:Wikipedia logo langevin dynamics.gif|thumb|A simulation of sampling from a Wikipedia-logo-like distribution via Langevin Dynamics and score matching. ]] Langevin Dynamics are typically used in complex distribution sampling and generative modeling,<ref>{{Cite journal |last=Hinton |first=Geoffrey E. |date=2002-08-01 |title=Training Products of Experts by Minimizing Contrastive Divergence |url=https://ieeexplore.ieee.org/document/6789337 |journal=Neural Computation |volume=14 |issue=8 |pages=1771β1800 |doi=10.1162/089976602760128018 |pmid=12180402 |issn=0899-7667}}</ref><ref name=":0">{{Citation |last1=Song |first1=Yang |title=Generative modeling by estimating gradients of the data distribution |date=2019-12-08 |work=Proceedings of the 33rd International Conference on Neural Information Processing Systems |issue=1067 |pages=11918β11930 |url=https://dl.acm.org/doi/10.5555/3454287.3455354 |access-date=2025-04-28 |place=Red Hook, NY, USA |publisher=Curran Associates Inc. |last2=Ermon |first2=Stefano}}</ref> via an MCMC procedure. Specifically, given the probability density function <math>p(x)</math>, we use its log gradient <math>\nabla_x \log p(x)</math> as the score function and start from a prior distribution <math>x_0 \sim p_0 </math>. Then, a chain is built by :<math> x_{i+1} = x_i + \epsilon \nabla_x \log p(x) + \sqrt{2 \epsilon}z_i, z_i \sim \mathcal{N}(0, I) </math> for <math>i=0, \dots, K</math>. When <math>\epsilon \rightarrow 0</math> and <math>K \rightarrow \infty</math>, <math>x_K</math> converges to a sample from the target distribution <math>p(x)</math>. For some complex distribution, if we know its probability density function but find it difficult to directly sample from it, we can apply Langevin Dynamics as an alternate. However, in most cases, especially generative modeling, usually we do not know the exact probability density function of the target distribution we wish to sample from, neither the score function <math>\nabla_x \log p(x)</math>. In this case, score matching methods<ref>{{Cite journal |last=HyvΓ€rinen |first=Aapo |date=2005 |title=Estimation of Non-Normalized Statistical Models by Score Matching |url=https://jmlr.org/papers/v6/hyvarinen05a.html |journal=Journal of Machine Learning Research |volume=6 |issue=24 |pages=695β709 |issn=1533-7928}}</ref><ref name=":1">{{Cite journal |last=Vincent |first=Pascal |date=July 2011 |title=A Connection Between Score Matching and Denoising Autoencoders |url=https://ieeexplore.ieee.org/document/6795935 |journal=Neural Computation |volume=23 |issue=7 |pages=1661β1674 |doi=10.1162/NECO_a_00142 |pmid=21492012 |issn=0899-7667}}</ref><ref name=":2">{{Cite journal |last1=Song |first1=Yang |last2=Garg |first2=Sahaj |last3=Shi |first3=Jiaxin |last4=Ermon |first4=Stefano |date=2020-08-06 |title=Sliced Score Matching: A Scalable Approach to Density and Score Estimation |url=https://proceedings.mlr.press/v115/song20a |journal=Proceedings of the 35th Uncertainty in Artificial Intelligence Conference |language=en |publisher=PMLR |pages=574β584}}</ref> provide feasible solutions, minimizing the [[Fisher information metric]] between a parameterized score-based model <math>s_\theta(x)</math> and the score function without knowing the ground-truth data score. The score function can be estimated on a training dataset by [[stochastic gradient descent]]. In real cases, however, the training data only takes a small region of the target distribution, and the estimated score functions are inaccurate in other low density regions with fewer available data examples. To overcome this challenge, denoising score matching<ref name=":0" /><ref name=":1" /><ref name=":4">{{Cite journal |last1=Song |first1=Yang |last2=Ermon |first2=Stefano |date=2020-12-06 |title=Improved techniques for training score-based generative models |url=https://dl.acm.org/doi/abs/10.5555/3495724.3496767 |journal=Proceedings of the 34th International Conference on Neural Information Processing Systems |series=NIPS '20 |location=Red Hook, NY, USA |publisher=Curran Associates Inc. |pages=12438β12448 |isbn=978-1-7138-2954-6}}</ref> methods purturb the available data examples with noise of different scales, which can improve the coverage of low density regions, and use them as the training dataset for the score-base model. Note that the choice of noise scales is tricky, as too large noise will corrupt the original data, while too small noise will not populate the original data to those low density regions. Thus, carefully crafted noise schedules<ref name=":0" /><ref name=":2" /><ref name=":4" /> are applied for higher quality generation.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)