Editing Optical flow (section)

===Classical Models===

Many classical models use the intuitive assumption of ''brightness constancy''; that even if a point moves between frames, its brightness stays constant.<ref name="Fortun_Survey_2015">{{cite journal |last1=Fortun |first1=Denis |last2=Bouthemy |first2=Patrick |last3=Kervrann |first3=Charles|title=Optical flow modeling and computation: A survey |journal=Computer Vision and Image Understanding |date=2015-05-01 |volume=134 |pages=1–21 |doi=10.1016/j.cviu.2015.02.008 |url=https://www.sciencedirect.com/science/article/pii/S1077314215000429 |access-date=2024-12-23}}</ref>
To formalise this intuitive assumption, consider two consecutive frames from a video sequence, with intensity <math>I(x, y, t)</math>, where <math>(x, y)</math> refer to pixel coordinates and <math>t</math> refers to time.
In this case, the brightness constancy constraint is
:<math>
I(x, y, t) - I(x + u, y + v, t + 1) = 0,
</math>
where <math>\mathbf{w}:= (u, v)</math> is the displacement vector between a point in the first frame and the corresponding point in the second frame.
By itself, the brightness constancy constraint cannot be solved for <math>u</math> and <math>v</math> at each pixel, since there is only one equation and two unknowns.
This is known as the ''[[Motion perception#The aperture problem|aperture problem]]''.
Therefore, additional constraints must be imposed to estimate the flow field.<ref name="Brox_2004">{{cite conference |url=http://link.springer.com/10.1007/978-3-540-24673-2_3 |title=High Accuracy Optical Flow Estimation Based on a Theory for Warping |last1=Brox |first1=Thomas |last2=Bruhn |first2=Andrés |last3=Papenberg |first3=Nils |last4=Weickert |first4=Joachim |date=2004 |publisher=Springer Berlin Heidelberg |book-title=Computer Vision - ECCV 2004 |pages=25–36 |location=Berlin, Heidelberg |doi=10.1007/978-3-540-24673-2_3 |conference=ECCV 2004}}</ref><ref name="Baker_2011">{{cite journal |last1=Baker |first1=Simon |last2=Scharstein |first2=Daniel |last3=Lewis |first3=J. P. |last4=Roth |first4=Stefan |last5=Black |first5=Michael J. |last6=Szeliski |first6=Richard |title=A Database and Evaluation Methodology for Optical Flow |journal=International Journal of Computer Vision |date=1 March 2011 |volume=92 |issue=1 |pages=1–31 |doi=10.1007/s11263-010-0390-2 |url=https://link.springer.com/article/10.1007/s11263-010-0390-2 |access-date=25 Dec 2024 |language=en |issn=1573-1405}}</ref>

==== Regularized Models ====
Perhaps the most natural approach to addressing the aperture problem is to apply a smoothness constraint or a ''regularization constraint'' to the flow field.
One can combine both of these constraints to formulate estimating optical flow as an [[Optimization problem|optimization problem]], where the goal is to minimize the cost function of the form,
:<math>E = \iint_\Omega \Psi(I(x + u, y + v, t + 1) - I(x, y, t)) + \alpha \Psi(|\nabla u|) + \alpha \Psi(|\nabla v|) dx dy, </math>
where <math>\Omega</math> is the extent of the images <math>I(x, y)</math>, <math>\nabla</math> is the gradient operator, <math>\alpha</math> is a constant, and <math>\Psi()</math> is a [[loss function]].<ref name="Fortun_Survey_2015" /><ref name="Brox_2004" />

This optimisation problem is difficult to solve owing to its non-linearity.
To address this issue, one can use a ''variational approach'' and linearise the brightness constancy constraint using a first order [[Taylor series]] approximation. Specifically, the brightness constancy constraint is approximated as,
:<math>\frac{\partial I}{\partial x}u+\frac{\partial I}{\partial y}v+\frac{\partial I}{\partial t} = 0.</math>
For convenience, the derivatives of the image, <math>\tfrac{\partial I}{\partial x}</math>, <math>\tfrac{\partial I}{\partial y}</math> and <math>\tfrac{\partial I}{\partial t}</math> are often condensed to become <math>I_x</math>, <math>I_y</math> and <math> I_t</math>.
Doing so, allows one to rewrite the linearised brightness constancy constraint as,<ref name="Baker_2011" />
:<math>I_x u + I_y v+ I_t = 0.</math>
The optimization problem can now be rewritten as
:<math>E = \iint_\Omega \Psi(I_x u + I_y v + I_t) + \alpha \Psi(|\nabla u|) + \alpha \Psi(|\nabla v|) dx dy. </math>
For the choice of <math>\Psi(x) = x^2</math>, this method is the same as the [[Horn-Schunck method]].<ref name="Horn_1980" />
Of course, other choices of cost function have been used such as <math>\Psi(x) = \sqrt{x^2 + \epsilon^2}</math>, which is a differentiable variant of the [[Taxicab geometry |<math>L^1</math> norm]].<ref name="Fortun_Survey_2015" /><ref>{{cite conference |url=https://ieeexplore.ieee.org/document/5539939 |title=Secrets of optical flow estimation and their principles |last1=Sun |first1=Deqing |last2=Roth |first2=Stefan |last3=Black |first3=Micahel J. |date=2010 |publisher=IEEE |book-title=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition |pages= 2432–2439 |location=San Francisco, CA, USA |conference=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition}}</ref>

To solve the aforementioned optimization problem, one can use the [[Euler-Lagrange equations]] to provide a system of partial differential equations for each point in <math>I(x, y, t)</math>. In the simplest case of using <math>\Psi(x) = x^2</math>, these equations are,
:<math> I_x(I_xu+I_yv+I_t) - \alpha \Delta u = 0,</math>
:<math> I_y(I_xu+I_yv+I_t) - \alpha \Delta v = 0,</math>
where <math>\Delta = \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2} </math> denotes the [[Laplace operator]].
Since the image data is made up of discrete pixels, these equations are discretised.
Doing so yields a system of linear equations which can be solved for <math>(u, v)</math> at each pixel, using an iterative scheme such as [[Gauss-Seidel]].<ref name="Horn_1980" />

Although, linearising the brightness constancy constraint simplifies the optimisation problem significantly, the linearisation is only valid for small displacements and/or smooth images. To avoid this problem, a multi-scale or coarse-to-fine approach is often used. In such a scheme, the images are initially [[downsampling|downsampled]] and the linearised Euler-Lagrange equations are solved at the reduced resolution. The estimated flow field at this scale is then used to initialise the process at next scale.<ref>{{cite journal |last1=Meinhardt-Llopis |first1=Enric |last2=Pérez |first2=Javier Sánchez |last3=Kondermann |first3=Daniel |title=Horn-Schunck Optical Flow with a Multi-Scale Strategy |journal=Image Processing on Line |date=19 July 2013 |volume=3 |pages=151–172 |doi=10.5201/ipol.2013.20}}</ref> This initialisation process is often performed by [[image warping|warping]] one frame using the current estimate of flow field so that it is as similar to other as possible.<ref name="Brox_2004" /><ref>{{cite journal |last1=Black |first1=Michael J. |last2=Anandan |first2=P. |title=The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields |journal=Computer Vision and Image Understanding |date=1 January 1996 |volume=63 |issue=1 |pages=75–104 |doi=10.1006/cviu.1996.0006 |issn=1077-3142}}</ref>

An alternate approach is to discretize the optimisation problem and then perform a search of the possible <math>(u, v)</math> values without linearising it.<ref>{{cite conference |url=https://ieeexplore.ieee.org/document/5459364 |title=Large Displacement Optical Flow Computation without Warping |last1=Steinbr¨ucker |first1=Frank |last2=Pock |first2=Thomas |last3=Cremers |first3=Daniel |last4=Weickert |first4=Joachim |date=2009 |publisher=IEEE |book-title=2009 IEEE 12th International Conference on Computer Vision |pages=1609–1614 |conference=2009 IEEE 12th International Conference on Computer Vision}}</ref>
This search is often performed using [[Max-flow min-cut theorem]] algorithms, linear programming or [[belief propagation]] methods.

==== Parametric Models ====

Instead of applying the regularization constraint on a point by point basis as per a regularized model, one can group pixels into regions and estimate the motion of these regions.
This is known as a ''parametric model'', since the motion of these regions is [[parameter|parameterized]].
In formulating optical flow estimation in this way, one makes the assumption that the motion field in each region be fully characterised by a set of parameters.
Therefore, the goal of a parametric model is to estimate the motion parameters that minimise a loss function which can be written as,
:<math>
\hat{\boldsymbol{\alpha}} = \arg \min_{\boldsymbol{\alpha}} \sum_{(x, y) \in \mathcal{R}} g(x, y) \rho(x, y, I_1, I_2, u_{\boldsymbol{\alpha}}, v_{\boldsymbol{\alpha}}),
</math>
where <math>{\boldsymbol{\alpha}}</math> is the set of parameters determining the motion in the region <math>\mathcal{R}</math>, <math>\rho()</math> is data cost term, <math>g()</math> is a weighting function that determines the influence of pixel <math>(x, y)</math> on the total cost, and <math>I_1</math> and <math>I_2</math> are frames 1 and 2 from a pair of consecutive frames.<ref name="Fortun_Survey_2015" />

The simplest parametric model is the [[Lucas-Kanade method]]. This uses rectangular regions and parameterises the motion as purely translational. The Lucas-Kanade method uses the original brightness constancy constrain as the data cost term and selects <math>g(x, y) = 1</math>.
This yields the local loss function,
:<math>
\hat{\boldsymbol{\alpha}} = \arg \min_{\boldsymbol{\alpha}} \sum_{(x, y) \in \mathcal{R}} | I(x + u_{\boldsymbol{\alpha}}, y + v_{\boldsymbol{\alpha}}, t + 1) - I(x, y, t)| .
</math>
Other possible local loss functions include the negative normalized [[cross-correlation]] between the two frames.<ref>{{cite conference |last1=Lucas |first1=Bruce D. |last2=Kanade |first2=Takeo |date=1981-08-24 |title=An iterative image registration technique with an application to stereo vision |url=https://dl.acm.org/doi/10.5555/1623264.1623280 |journal=Proceedings of the 7th International Joint Conference on Artificial Intelligence - Volume 2 |series=IJCAI'81 |location=San Francisco, CA, USA |publisher=Morgan Kaufmann Publishers Inc. |pages=674–679}}</ref>