Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Scale-invariant feature transform
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Competing methods == Alternative methods for scale-invariant object recognition under clutter / partial occlusion include the following. RIFT<ref name="Lazebnik2004" /> is a rotation-invariant generalization of SIFT. The RIFT descriptor is constructed using circular normalized patches divided into concentric rings of equal width and within each ring a gradient orientation histogram is computed. To maintain rotation invariance, the orientation is measured at each point relative to the direction pointing outward from the center. RootSIFT<ref name="Arandjelovic2012" /> is a variant of SIFT that modifies descriptor normalization. Because SIFT descriptors are histograms (and so are [[probability distribution]]s), [[Euclidean distance]] is not an accurate way to measure their similarity. Better similarity metrics turn out to be ones tailored to probability distributions, such as [[Bhattacharyya coefficient]] (also called Hellinger kernel). For this purpose, the originally <math>\ell^2</math>-normalized descriptor is first <math>\ell^1</math>-normalized and the square root of each element is computed, followed by <math>\ell^2</math>-renormalization. After these algebraic manipulations, RootSIFT descriptors can be normally compared using [[Euclidean distance]], which is equivalent to using the Hellinger kernel on the original SIFT descriptors. This normalization scheme termed βL1-sqrtβ was previously introduced for the block normalization of [[Histogram of oriented gradients|HOG]] features whose rectangular block arrangement descriptor variant (R-HOG) is conceptually similar to the SIFT descriptor. G-RIF:<ref name="Sungho2006" /> Generalized Robust Invariant Feature is a general context descriptor which encodes edge orientation, edge density and [[hue]] information in a unified form combining perceptual information with spatial encoding. The object recognition scheme uses neighboring context based voting to estimate object models. "[[Speeded up robust features|SURF]]:<ref name="Bay2006" /> Speeded Up Robust Features" is a high-performance scale- and rotation-invariant interest point detector / descriptor claimed to approximate or even outperform previously proposed schemes with respect to repeatability, distinctiveness, and robustness. SURF relies on [[integral image]]s for image convolutions to reduce computation time, builds on the strengths of the leading existing detectors and descriptors (using a fast [[Hessian matrix]]-based measure for the detector and a distribution-based descriptor). It describes a distribution of [[Haar wavelet]] responses within the interest point neighborhood. Integral images are used for speed and only 64 dimensions are used reducing the time for feature computation and matching. The indexing step is based on the sign of the [[Laplace operator|Laplacian]], which increases the matching speed and the robustness of the descriptor. PCA-SIFT<ref name="Ke2004" /> and [[GLOH]]<ref name="Mikolajczyk2005" /> are variants of SIFT. PCA-SIFT descriptor is a vector of image gradients in x and y direction computed within the support region. The gradient region is sampled at 39Γ39 locations, therefore the vector is of dimension 3042. The dimension is reduced to 36 with [[Principal component analysis|PCA]]. Gradient location-orientation histogram ([[GLOH]]) is an extension of the SIFT descriptor designed to increase its robustness and distinctiveness. The SIFT descriptor is computed for a log-polar location grid with three bins in radial direction (the radius set to 6, 11, and 15) and 8 in angular direction, which results in 17 location bins. The central bin is not divided in angular directions. The gradient orientations are quantized in 16 bins resulting in 272-bin histogram. The size of this descriptor is reduced with [[Principal component analysis|PCA]]. The [[covariance matrix]] for [[Principal component analysis|PCA]] is estimated on image patches collected from various images. The 128 largest [[Eigenvalues and eigenvectors|eigenvectors]] are used for description. Gauss-SIFT<ref name=Lin15JMIV>{{Cite journal|title=Image Matching Using Generalized Scale-Space Interest Points|first=Tony|last=Lindeberg|date=May 1, 2015|journal=Journal of Mathematical Imaging and Vision|volume=52|issue=1|pages=3β36|doi=10.1007/s10851-014-0541-0|s2cid=254657377 |doi-access=free|bibcode=2015JMIV...52....3L }}</ref> is a pure image descriptor defined by performing all image measurements underlying the pure image descriptor in SIFT by Gaussian derivative responses as opposed to derivative approximations in an image pyramid as done in regular SIFT. In this way, discretization effects over space and scale can be reduced to a minimum allowing for potentially more accurate image descriptors. In Lindeberg (2015)<ref name="Lin15JMIV" /> such pure Gauss-SIFT image descriptors were combined with a set of generalized scale-space interest points comprising the [[Laplacian of the Gaussian]], the [[determinant of the Hessian]], four new unsigned or signed [[Hessian feature strength measures]] as well as [[Harris-Laplace detector|Harris-Laplace]] and [[Shi-and-Tomasi]] interests points. In an extensive experimental evaluation on a poster dataset comprising multiple views of 12 posters over scaling transformations up to a factor of 6 and viewing direction variations up to a slant angle of 45 degrees, it was shown that substantial increase in performance of image matching (higher efficiency scores and lower 1-[[Precision (information retrieval)|precision]] scores) could be obtained by replacing Laplacian of Gaussian interest points by determinant of the Hessian interest points. Since difference-of-Gaussians interest points constitute a numerical approximation of Laplacian of the Gaussian interest points, this shows that a substantial increase in matching performance is possible by replacing the difference-of-Gaussians interest points in SIFT by determinant of the Hessian interest points. Additional increase in performance can furthermore be obtained by considering the unsigned Hessian feature strength measure <math>D_1 L = \operatorname{det} H L - k \, \operatorname{trace}^2 H L \, \mbox{if} \operatorname{det} H L - k \, \operatorname{trace}^2 H L >0 \, \mbox{or 0 otherwise}</math>. A quantitative comparison between the Gauss-SIFT descriptor and a corresponding Gauss-SURF descriptor did also show that Gauss-SIFT does generally perform significantly better than Gauss-SURF for a large number of different scale-space interest point detectors. This study therefore shows that discregarding discretization effects the pure image descriptor in SIFT is significantly better than the pure image descriptor in SURF, whereas the underlying interest point detector in SURF, which can be seen as numerical approximation to scale-space extrema of the determinant of the Hessian, is significantly better than the underlying interest point detector in SIFT. Wagner et al. developed two object recognition algorithms especially designed with the limitations of current mobile phones in mind.<ref name="Wagner2008" /> In contrast to the classic SIFT approach, Wagner et al. use the [[Features from accelerated segment test|FAST]] corner detector for feature detection. The algorithm also distinguishes between the off-line preparation phase where features are created at different scale levels and the on-line phase where features are only created at the current fixed scale level of the phone's camera image. In addition, features are created from a fixed patch size of 15Γ15 pixels and form a SIFT descriptor with only 36 dimensions. The approach has been further extended by integrating a [[Scalable Vocabulary Tree]] in the recognition pipeline.<ref name="Henze2009" /> This allows the efficient recognition of a larger number of objects on mobile phones. The approach is mainly restricted by the amount of available [[Random-access memory|RAM]]. KAZE and A-KAZE ''(KAZE Features and Accelerated-Kaze Features)'' is a new 2D feature detection and description method that perform better compared to SIFT and SURF. It gains a lot of popularity due to its open source code. KAZE was originally made by Pablo F. Alcantarilla, Adrien Bartoli and Andrew J. Davison.<ref>{{Cite web|url=http://www.robesafe.com/personal/pablo.alcantarilla/kaze.html|title=kaze|website=www.robesafe.com}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)