Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Optical flow
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Classical Models=== Many classical models use the intuitive assumption of ''brightness constancy''; that even if a point moves between frames, its brightness stays constant.<ref name="Fortun_Survey_2015">{{cite journal |last1=Fortun |first1=Denis |last2=Bouthemy |first2=Patrick |last3=Kervrann |first3=Charles|title=Optical flow modeling and computation: A survey |journal=Computer Vision and Image Understanding |date=2015-05-01 |volume=134 |pages=1–21 |doi=10.1016/j.cviu.2015.02.008 |url=https://www.sciencedirect.com/science/article/pii/S1077314215000429 |access-date=2024-12-23}}</ref> To formalise this intuitive assumption, consider two consecutive frames from a video sequence, with intensity <math>I(x, y, t)</math>, where <math>(x, y)</math> refer to pixel coordinates and <math>t</math> refers to time. In this case, the brightness constancy constraint is :<math> I(x, y, t) - I(x + u, y + v, t + 1) = 0, </math> where <math>\mathbf{w}:= (u, v)</math> is the displacement vector between a point in the first frame and the corresponding point in the second frame. By itself, the brightness constancy constraint cannot be solved for <math>u</math> and <math>v</math> at each pixel, since there is only one equation and two unknowns. This is known as the ''[[Motion perception#The aperture problem|aperture problem]]''. Therefore, additional constraints must be imposed to estimate the flow field.<ref name="Brox_2004">{{cite conference |url=http://link.springer.com/10.1007/978-3-540-24673-2_3 |title=High Accuracy Optical Flow Estimation Based on a Theory for Warping |last1=Brox |first1=Thomas |last2=Bruhn |first2=Andrés |last3=Papenberg |first3=Nils |last4=Weickert |first4=Joachim |date=2004 |publisher=Springer Berlin Heidelberg |book-title=Computer Vision - ECCV 2004 |pages=25–36 |location=Berlin, Heidelberg |doi=10.1007/978-3-540-24673-2_3 |conference=ECCV 2004}}</ref><ref name="Baker_2011">{{cite journal |last1=Baker |first1=Simon |last2=Scharstein |first2=Daniel |last3=Lewis |first3=J. P. |last4=Roth |first4=Stefan |last5=Black |first5=Michael J. |last6=Szeliski |first6=Richard |title=A Database and Evaluation Methodology for Optical Flow |journal=International Journal of Computer Vision |date=1 March 2011 |volume=92 |issue=1 |pages=1–31 |doi=10.1007/s11263-010-0390-2 |url=https://link.springer.com/article/10.1007/s11263-010-0390-2 |access-date=25 Dec 2024 |language=en |issn=1573-1405}}</ref> ==== Regularized Models ==== Perhaps the most natural approach to addressing the aperture problem is to apply a smoothness constraint or a ''regularization constraint'' to the flow field. One can combine both of these constraints to formulate estimating optical flow as an [[Optimization problem|optimization problem]], where the goal is to minimize the cost function of the form, :<math>E = \iint_\Omega \Psi(I(x + u, y + v, t + 1) - I(x, y, t)) + \alpha \Psi(|\nabla u|) + \alpha \Psi(|\nabla v|) dx dy, </math> where <math>\Omega</math> is the extent of the images <math>I(x, y)</math>, <math>\nabla</math> is the gradient operator, <math>\alpha</math> is a constant, and <math>\Psi()</math> is a [[loss function]].<ref name="Fortun_Survey_2015" /><ref name="Brox_2004" /> This optimisation problem is difficult to solve owing to its non-linearity. To address this issue, one can use a ''variational approach'' and linearise the brightness constancy constraint using a first order [[Taylor series]] approximation. Specifically, the brightness constancy constraint is approximated as, :<math>\frac{\partial I}{\partial x}u+\frac{\partial I}{\partial y}v+\frac{\partial I}{\partial t} = 0.</math> For convenience, the derivatives of the image, <math>\tfrac{\partial I}{\partial x}</math>, <math>\tfrac{\partial I}{\partial y}</math> and <math>\tfrac{\partial I}{\partial t}</math> are often condensed to become <math>I_x</math>, <math>I_y</math> and <math> I_t</math>. Doing so, allows one to rewrite the linearised brightness constancy constraint as,<ref name="Baker_2011" /> :<math>I_x u + I_y v+ I_t = 0.</math> The optimization problem can now be rewritten as :<math>E = \iint_\Omega \Psi(I_x u + I_y v + I_t) + \alpha \Psi(|\nabla u|) + \alpha \Psi(|\nabla v|) dx dy. </math> For the choice of <math>\Psi(x) = x^2</math>, this method is the same as the [[Horn-Schunck method]].<ref name="Horn_1980" /> Of course, other choices of cost function have been used such as <math>\Psi(x) = \sqrt{x^2 + \epsilon^2}</math>, which is a differentiable variant of the [[Taxicab geometry |<math>L^1</math> norm]].<ref name="Fortun_Survey_2015" /><ref>{{cite conference |url=https://ieeexplore.ieee.org/document/5539939 |title=Secrets of optical flow estimation and their principles |last1=Sun |first1=Deqing |last2=Roth |first2=Stefan |last3=Black |first3=Micahel J. |date=2010 |publisher=IEEE |book-title=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition |pages= 2432–2439 |location=San Francisco, CA, USA |conference=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition}}</ref> To solve the aforementioned optimization problem, one can use the [[Euler-Lagrange equations]] to provide a system of partial differential equations for each point in <math>I(x, y, t)</math>. In the simplest case of using <math>\Psi(x) = x^2</math>, these equations are, :<math> I_x(I_xu+I_yv+I_t) - \alpha \Delta u = 0,</math> :<math> I_y(I_xu+I_yv+I_t) - \alpha \Delta v = 0,</math> where <math>\Delta = \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2} </math> denotes the [[Laplace operator]]. Since the image data is made up of discrete pixels, these equations are discretised. Doing so yields a system of linear equations which can be solved for <math>(u, v)</math> at each pixel, using an iterative scheme such as [[Gauss-Seidel]].<ref name="Horn_1980" /> Although, linearising the brightness constancy constraint simplifies the optimisation problem significantly, the linearisation is only valid for small displacements and/or smooth images. To avoid this problem, a multi-scale or coarse-to-fine approach is often used. In such a scheme, the images are initially [[downsampling|downsampled]] and the linearised Euler-Lagrange equations are solved at the reduced resolution. The estimated flow field at this scale is then used to initialise the process at next scale.<ref>{{cite journal |last1=Meinhardt-Llopis |first1=Enric |last2=Pérez |first2=Javier Sánchez |last3=Kondermann |first3=Daniel |title=Horn-Schunck Optical Flow with a Multi-Scale Strategy |journal=Image Processing on Line |date=19 July 2013 |volume=3 |pages=151–172 |doi=10.5201/ipol.2013.20}}</ref> This initialisation process is often performed by [[image warping|warping]] one frame using the current estimate of flow field so that it is as similar to other as possible.<ref name="Brox_2004" /><ref>{{cite journal |last1=Black |first1=Michael J. |last2=Anandan |first2=P. |title=The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields |journal=Computer Vision and Image Understanding |date=1 January 1996 |volume=63 |issue=1 |pages=75–104 |doi=10.1006/cviu.1996.0006 |issn=1077-3142}}</ref> An alternate approach is to discretize the optimisation problem and then perform a search of the possible <math>(u, v)</math> values without linearising it.<ref>{{cite conference |url=https://ieeexplore.ieee.org/document/5459364 |title=Large Displacement Optical Flow Computation without Warping |last1=Steinbr¨ucker |first1=Frank |last2=Pock |first2=Thomas |last3=Cremers |first3=Daniel |last4=Weickert |first4=Joachim |date=2009 |publisher=IEEE |book-title=2009 IEEE 12th International Conference on Computer Vision |pages=1609–1614 |conference=2009 IEEE 12th International Conference on Computer Vision}}</ref> This search is often performed using [[Max-flow min-cut theorem]] algorithms, linear programming or [[belief propagation]] methods. ==== Parametric Models ==== Instead of applying the regularization constraint on a point by point basis as per a regularized model, one can group pixels into regions and estimate the motion of these regions. This is known as a ''parametric model'', since the motion of these regions is [[parameter|parameterized]]. In formulating optical flow estimation in this way, one makes the assumption that the motion field in each region be fully characterised by a set of parameters. Therefore, the goal of a parametric model is to estimate the motion parameters that minimise a loss function which can be written as, :<math> \hat{\boldsymbol{\alpha}} = \arg \min_{\boldsymbol{\alpha}} \sum_{(x, y) \in \mathcal{R}} g(x, y) \rho(x, y, I_1, I_2, u_{\boldsymbol{\alpha}}, v_{\boldsymbol{\alpha}}), </math> where <math>{\boldsymbol{\alpha}}</math> is the set of parameters determining the motion in the region <math>\mathcal{R}</math>, <math>\rho()</math> is data cost term, <math>g()</math> is a weighting function that determines the influence of pixel <math>(x, y)</math> on the total cost, and <math>I_1</math> and <math>I_2</math> are frames 1 and 2 from a pair of consecutive frames.<ref name="Fortun_Survey_2015" /> The simplest parametric model is the [[Lucas-Kanade method]]. This uses rectangular regions and parameterises the motion as purely translational. The Lucas-Kanade method uses the original brightness constancy constrain as the data cost term and selects <math>g(x, y) = 1</math>. This yields the local loss function, :<math> \hat{\boldsymbol{\alpha}} = \arg \min_{\boldsymbol{\alpha}} \sum_{(x, y) \in \mathcal{R}} | I(x + u_{\boldsymbol{\alpha}}, y + v_{\boldsymbol{\alpha}}, t + 1) - I(x, y, t)| . </math> Other possible local loss functions include the negative normalized [[cross-correlation]] between the two frames.<ref>{{cite conference |last1=Lucas |first1=Bruce D. |last2=Kanade |first2=Takeo |date=1981-08-24 |title=An iterative image registration technique with an application to stereo vision |url=https://dl.acm.org/doi/10.5555/1623264.1623280 |journal=Proceedings of the 7th International Joint Conference on Artificial Intelligence - Volume 2 |series=IJCAI'81 |location=San Francisco, CA, USA |publisher=Morgan Kaufmann Publishers Inc. |pages=674–679}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)