Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Scale-invariant feature transform
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Overview == {{Technical|section|date=October 2010}} For any object in an image, we can extract important points in the image to provide a "feature description" of the object. This description, extracted from a training image, can then be used to locate the object in a new (previously unseen) image containing other objects. In order to do this reliably, the features should be detectable even if the image is scaled, or if it has noise and different illumination. Such points usually lie on high-contrast regions of the image, such as object edges. Another important characteristic of these features is that the relative positions between them in the original scene should not change between images. For example, if only the four corners of a door were used as features, they would work regardless of the door's position; but if points in the frame were also used, the recognition would fail if the door is opened or closed. Similarly, features located in articulated or flexible objects would typically not work if any change in their internal geometry happens between two images in the set being processed. In practice, SIFT detects and uses a much larger number of features from the images, which reduces the contribution of the errors caused by these local variations in the average error of all feature matching errors. SIFT<ref name="patent" /> can robustly identify objects even among clutter and under partial occlusion, because the SIFT feature descriptor is invariant to [[Scaling (geometry)|uniform scaling]], [[Orientation (geometry)|orientation]], illumination changes, and partially invariant to [[Affine transformation|affine distortion]].<ref name="Lowe1999" /> This section summarizes the original SIFT algorithm and mentions a few competing techniques available for object recognition under clutter and partial occlusion. The SIFT descriptor is based on image measurements in terms of ''receptive fields''<ref name="KoeDoo87" /><ref name="KoeDoo92" /><ref name="Lin13BICY" /><ref name="Lin13-AdvImgPhy" /> over which ''local scale invariant reference frames''<ref name="Lin13PONE" /><ref name="Lin14CompVis" /> are established by ''local scale selection''.<ref name="Lin94Book" /><ref name="Lindeberg1998" /><ref name="Lin14CompVis" /> A general theoretical explanation about this is given in the Scholarpedia article on SIFT.<ref name="Lindeberg2012" /> {| class="wikitable" ! Problem ! Technique ! Advantage |- | key localization / scale / rotation | [[Difference of Gaussians]] / [[Scale-space representation|scale-space pyramid]] / orientation assignment | accuracy, stability, scale & rotational invariance |- | geometric distortion | blurring / resampling of local image orientation planes | affine invariance |- | indexing and matching | [[Nearest neighbor search|nearest neighbor]] / [[Best Bin First]] search | Efficiency / speed |- | Cluster identification | [[Hough Transform]] voting | reliable [[Pose (computer vision)|pose]] models |- | Model verification / outlier detection | [[Linear least squares]] | better error tolerance with fewer matches |- | Hypothesis acceptance | [[Bayesian Probability]] analysis | reliability |} === Types of features === {{Unreferenced section|date=April 2022}} The detection and description of local image features can help in object recognition. The SIFT features are local and based on the appearance of the object at particular interest points, and are invariant to image scale and rotation. They are also robust to changes in illumination, noise, and minor changes in viewpoint. In addition to these properties, they are highly distinctive, relatively easy to extract and allow for correct object identification with low probability of mismatch. They are relatively easy to match against a (large) database of local features but, however, the high dimensionality can be an issue, and generally probabilistic algorithms such as [[k-d tree]]s with [[best bin first]] search are used. Object description by set of SIFT features is also robust to partial occlusion; as few as 3 SIFT features from an object are enough to compute its location and pose. Recognition can be performed in close-to-real time, at least for small databases and on modern computer hardware.{{Citation needed|date=August 2008}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)