Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Markov chain Monte Carlo
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Complex Distribution Sampling === [[File:Wikipedia logo langevin dynamics.gif|thumb|A simulation of sampling from a Wikipedia-logo-like distribution via Langevin Dynamics and score matching. ]] Langevin Dynamics are typically used in complex distribution sampling and generative modeling,<ref>{{Cite journal |last=Hinton |first=Geoffrey E. |date=2002-08-01 |title=Training Products of Experts by Minimizing Contrastive Divergence |url=https://ieeexplore.ieee.org/document/6789337 |journal=Neural Computation |volume=14 |issue=8 |pages=1771β1800 |doi=10.1162/089976602760128018 |pmid=12180402 |issn=0899-7667}}</ref><ref name=":0">{{Citation |last1=Song |first1=Yang |title=Generative modeling by estimating gradients of the data distribution |date=2019-12-08 |work=Proceedings of the 33rd International Conference on Neural Information Processing Systems |issue=1067 |pages=11918β11930 |url=https://dl.acm.org/doi/10.5555/3454287.3455354 |access-date=2025-04-28 |place=Red Hook, NY, USA |publisher=Curran Associates Inc. |last2=Ermon |first2=Stefano}}</ref> via an MCMC procedure. Specifically, given the probability density function <math>p(x)</math>, we use its log gradient <math>\nabla_x \log p(x)</math> as the score function and start from a prior distribution <math>x_0 \sim p_0 </math>. Then, a chain is built by :<math> x_{i+1} = x_i + \epsilon \nabla_x \log p(x) + \sqrt{2 \epsilon}z_i, z_i \sim \mathcal{N}(0, I) </math> for <math>i=0, \dots, K</math>. When <math>\epsilon \rightarrow 0</math> and <math>K \rightarrow \infty</math>, <math>x_K</math> converges to a sample from the target distribution <math>p(x)</math>. For some complex distribution, if we know its probability density function but find it difficult to directly sample from it, we can apply Langevin Dynamics as an alternate. However, in most cases, especially generative modeling, usually we do not know the exact probability density function of the target distribution we wish to sample from, neither the score function <math>\nabla_x \log p(x)</math>. In this case, score matching methods<ref>{{Cite journal |last=HyvΓ€rinen |first=Aapo |date=2005 |title=Estimation of Non-Normalized Statistical Models by Score Matching |url=https://jmlr.org/papers/v6/hyvarinen05a.html |journal=Journal of Machine Learning Research |volume=6 |issue=24 |pages=695β709 |issn=1533-7928}}</ref><ref name=":1">{{Cite journal |last=Vincent |first=Pascal |date=July 2011 |title=A Connection Between Score Matching and Denoising Autoencoders |url=https://ieeexplore.ieee.org/document/6795935 |journal=Neural Computation |volume=23 |issue=7 |pages=1661β1674 |doi=10.1162/NECO_a_00142 |pmid=21492012 |issn=0899-7667}}</ref><ref name=":2">{{Cite journal |last1=Song |first1=Yang |last2=Garg |first2=Sahaj |last3=Shi |first3=Jiaxin |last4=Ermon |first4=Stefano |date=2020-08-06 |title=Sliced Score Matching: A Scalable Approach to Density and Score Estimation |url=https://proceedings.mlr.press/v115/song20a |journal=Proceedings of the 35th Uncertainty in Artificial Intelligence Conference |language=en |publisher=PMLR |pages=574β584}}</ref> provide feasible solutions, minimizing the [[Fisher information metric]] between a parameterized score-based model <math>s_\theta(x)</math> and the score function without knowing the ground-truth data score. The score function can be estimated on a training dataset by [[stochastic gradient descent]]. In real cases, however, the training data only takes a small region of the target distribution, and the estimated score functions are inaccurate in other low density regions with fewer available data examples. To overcome this challenge, denoising score matching<ref name=":0" /><ref name=":1" /><ref name=":4">{{Cite journal |last1=Song |first1=Yang |last2=Ermon |first2=Stefano |date=2020-12-06 |title=Improved techniques for training score-based generative models |url=https://dl.acm.org/doi/abs/10.5555/3495724.3496767 |journal=Proceedings of the 34th International Conference on Neural Information Processing Systems |series=NIPS '20 |location=Red Hook, NY, USA |publisher=Curran Associates Inc. |pages=12438β12448 |isbn=978-1-7138-2954-6}}</ref> methods purturb the available data examples with noise of different scales, which can improve the coverage of low density regions, and use them as the training dataset for the score-base model. Note that the choice of noise scales is tricky, as too large noise will corrupt the original data, while too small noise will not populate the original data to those low density regions. Thus, carefully crafted noise schedules<ref name=":0" /><ref name=":2" /><ref name=":4" /> are applied for higher quality generation.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)