Editing Scale-invariant feature transform (section)

=== Keypoint localization ===
[[File:Sift keypoints filtering.jpg|thumb|After scale space extrema are detected (their location being shown in the uppermost image) the SIFT algorithm discards low-contrast keypoints (remaining points are shown in the middle image) and then filters out those located on edges. Resulting set of keypoints is shown on last image.]] Scale-space extrema detection produces too many keypoint candidates, some of which are unstable. The next step in the algorithm is to perform a detailed fit to the nearby data for accurate location, scale, and ratio of [[principal curvatures]]. This information allows the rejection of points which are low contrast (and are therefore sensitive to noise) or poorly localized along an edge.

==== Interpolation of nearby data for accurate position ====
First, for each candidate keypoint, interpolation of nearby data is used to accurately determine its position. The initial approach was to just locate each keypoint at the location and scale of the candidate keypoint.<ref name="Lowe1999" /> The new approach calculates the interpolated location of the extremum, which substantially improves matching and stability.<ref name=Lowe2004 /> The interpolation is done using the quadratic [[Taylor expansion]] of the Difference-of-Gaussian scale-space function, <math>D \left( x, y, \sigma \right)</math> with the candidate keypoint as the origin. This Taylor expansion is given by:

:<math>D(\textbf{x}) = D + \frac{\partial D}{\partial \textbf{x}}^T\textbf{x} + \frac{1}{2}\textbf{x}^T \frac{\partial^2 D}{\partial \textbf{x}^2} \textbf{x}</math>

where D and its derivatives are evaluated at the candidate keypoint and <math>\textbf{x} = \left( x, y, \sigma \right)^T</math> is the offset from this point. The location of the extremum, <math>\hat{\textbf{x}}</math>, is determined by taking the derivative of this function with respect to <math>\textbf{x}</math> and setting it to zero. If the offset <math>\hat{\textbf{x}}</math> is larger than <math>0.5</math> in any dimension, then that's an indication that the extremum lies closer to another candidate keypoint. In this case, the candidate keypoint is changed and the interpolation performed instead about that point. Otherwise the offset is added to its candidate keypoint to get the interpolated estimate for the location of the extremum. A similar subpixel determination of the locations of scale-space extrema is performed in the real-time implementation based on hybrid pyramids developed by Lindeberg and his co-workers.<ref name="Lindenberg2003" />

==== Discarding low-contrast keypoints ====
To discard the keypoints with low contrast, the value of the second-order Taylor expansion <math>D(\textbf{x})</math> is computed at the offset <math>\hat{\textbf{x}}</math>. If this value is less than <math>0.03</math>, the candidate keypoint is discarded. Otherwise it is kept, with final scale-space location <math>\textbf{y} + \hat{\textbf{x}}</math>, where <math>\textbf{y}</math> is the original location of the keypoint.

==== Eliminating edge responses ====
The DoG function will have strong responses along edges, even if the candidate keypoint is not robust to small amounts of noise. Therefore, in order to increase stability, we need to eliminate the keypoints that have poorly determined locations but have high edge responses.

For poorly defined peaks in the DoG function, the [[principal curvature]] across the edge would be much larger than the principal curvature along it. Finding these principal curvatures amounts to solving for the [[Eigenvalues and eigenvectors|eigenvalues]] of the second-order [[Hessian matrix]], '''H''':

:<math> \textbf{H} =  \begin{bmatrix}
  D_{xx} & D_{xy} \\
  D_{xy} & D_{yy}
\end{bmatrix} </math>

The eigenvalues of '''H''' are proportional to the principal curvatures of D. It turns out that the ratio of the two eigenvalues, say <math>\alpha</math> is the larger one, and <math>\beta</math> the smaller one, with ratio <math>r = \alpha/\beta</math>, is sufficient for SIFT's purposes. The trace of '''H''', i.e., <math>D_{xx} + D_{yy}</math>, gives us the sum of the two eigenvalues, while its determinant, i.e., <math>D_{xx} D_{yy} - D_{xy}^2</math>, yields the product. The ratio <math> \text{R} = \operatorname{Tr}(\textbf{H})^2 / \operatorname{Det}(\textbf{H})</math> can be shown to be equal to <math>(r+1)^2/r</math>, which depends only on the ratio of the eigenvalues rather than their individual values. R is minimum when the eigenvalues are equal to each other. Therefore, the higher the [[absolute difference]] between the two eigenvalues, which is equivalent to a higher absolute difference between the two principal curvatures of D, the higher the value of R. It follows that, for some threshold eigenvalue ratio <math>r_{\text{th}}</math>, if R for a candidate keypoint is larger than <math>(r_{\text{th}} + 1)^2/r_{\text{th}}</math>, that keypoint is poorly localized and hence rejected. The new approach uses <math>r_{\text{th}} = 10</math>.<ref name="Lowe2004" />

This processing step for suppressing responses at edges is a transfer of a corresponding approach in the [[Harris corner detector|Harris operator]] for corner detection. The difference is that the measure for thresholding is computed from the Hessian matrix instead of a [[Structure tensor|second-moment matrix]].