Editing Template matching

{{Short description|Technique in digital image processing}}
{{About|template matching in digital image processing|template matching in psychology|Template matching theory}}
'''Template matching'''<ref>R. Brunelli, ''Template Matching Techniques in Computer Vision: Theory and Practice'', Wiley, {{ISBN|978-0-470-51706-2}}, 2009 ''([http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470517069.html] TM book)''</ref> is a technique in [[digital image processing]] for finding small parts of an image which match a template image. It can be used for [[quality control]] in manufacturing,<ref>{{Cite journal | doi=10.1023/B:JIMS.0000034120.86709.8c|title = An industrial visual inspection system that uses inductive learning| journal=Journal of Intelligent Manufacturing| volume=15| issue=4| pages=569–574|year = 2004|last1 = Aksoy|first1 = M. S.| last2=Torkul| first2=O.| last3=Cedimoglu| first3=I. H.|s2cid = 35493679}}</ref> [[robotic navigation|navigation of mobile robots]],<ref>Kyriacou, Theocharis, Guido Bugmann, and Stanislao Lauria. "[https://pearl.plymouth.ac.uk/bitstream/handle/10026.1/2778/THEOCHARIS%20KYRIACOU.PDF?sequence=1&isAllowed=y Vision-based urban navigation procedures for verbally instructed robots]." Robotics and Autonomous Systems 51.1 (April 30, 2005): 69-80. Expanded Academic ASAP. Thomson Gale.</ref> or [[edge detection]] in images.<ref>WANG, CHING YANG, Ph.D. "EDGE DETECTION USING TEMPLATE MATCHING (IMAGE PROCESSING, THRESHOLD LOGIC, ANALYSIS, FILTERS)". Duke University, 1985, 288 pages; AAT 8523046</ref>

The main challenges in a template matching task are detection of occlusion, when a sought-after object is partly hidden in an image; detection of non-rigid transformations, when an object is distorted or imaged from different angles; sensitivity to illumination and background changes; background clutter; and scale changes.<ref>{{cite arXiv|last1=Talmi|first1=Itamar|last2=Mechrez|first2=Roey|last3=Zelnik-Manor|first3=Lihi|date=2016-12-07|title=Template Matching with Deformable Diversity Similarity|eprint=1612.02190|class=cs.CV}}</ref>

==Feature-based approach==
[[File:Artificial Neural Network.jpg|thumb|The hidden layer outputs a vector that holds classification information about the image and is used in the Template Matching algorithm as the features of the image]]
The feature-based approach to template matching relies on the extraction of [[Feature (computer vision)|image features]], such as shapes, textures, and colors, that match the target image or frame. This approach is usually achieved using [[Neural network|neural networks]] and [[Deep learning|deep-learning]] [[Statistical classification|classifiers]] such as VGG, [[AlexNet]], and [[Residual neural network|ResNet]].{{Citation needed|date=January 2023}}[[Convolutional neural network|Convolutional neural networks]] (CNNs), which many modern classifiers are based on, process an image by passing it through different hidden layers, producing a [[Vector space|vector]] at each layer with classification information about the image. These vectors are extracted from the network and used as the features of the image. [[Feature extraction]] using [[deep neural networks]], like CNNs, has proven extremely effective has become the standard in state-of-the-art template matching algorithms.<ref>{{cite arXiv|last1=Zhang|first1=Richard|last2=Isola|first2=Phillip|last3=Efros|first3=Alexei A.|last4=Shechtman|first4=Eli|last5=Wang|first5=Oliver|date=2018-01-11|title=The Unreasonable Effectiveness of Deep Features as a Perceptual Metric|eprint=1801.03924|class=cs.CV}}</ref>

This feature-based approach is often more robust than the template-based approach described below. As such, it has become the state-of-the-art method for template matching, as it can match templates with non-rigid and out-of-plane [[Rigid transformation|transformations]], as well as high background clutter and illumination changes.<ref>{{Cite arXiv|title=Template Matching with Deformable Diversity Similarity|last=Talmi, Mechrez, Zelnik-Manor|eprint = 1612.02190|class = cs.CV|year = 2016}}</ref><ref>Li, Yuhai, L. Jian, T. Jinwen, X. Honbo. “[https://www.spiedigitallibrary.org/conference-proceedings-of-spie/6043/60431P/A-fast-rotated-template-matching-based-on-point-feature/10.1117/12.654932.short A fast rotated template matching based on point feature].” Proceedings of the SPIE 6043 (2005): 453-459. MIPPR 2005: SAR and Multispectral Image Processing.</ref><ref>B. Sirmacek, C. Unsalan. “[https://www.iro.umontreal.ca/~mignotte/IFT6150/Articles/graphbuilding.pdf Urban Area and Building Detection Using SIFT Keypoints and Graph Theory]”, IEEE Transactions on Geoscience and Remote Sensing, Vol.47 (4), pp. 1156-1167, April 2009.</ref>

==Template-based approach==
[[File:Template Matching.png|thumb|Template matching with rotated templates]]
For templates without strong [[Feature (computer vision)|features]], or for when the bulk of a template image constitutes the matching image as a whole, a template-based approach may be effective. Since template-based matching may require sampling of a large number of data points, it is often desirable to reduce the number of sampling points by reducing the resolution of search and template images by the same factor before performing the operation on the resultant downsized images. This [[Data pre-processing|pre-processing]] method creates a multi-scale, or [[Pyramid (image processing)|pyramid]], representation of images, providing a reduced search window of data points within a search image so that the template does not have to be compared with every viable data point. Pyramid representations are a method of [[dimensionality reduction]], a common aim of machine learning on data sets that suffer the [[Curse of dimensionality#Data mining|curse of dimensionality]].

==Common challenges==

In instances where the template may not provide a direct match, it may be useful to implement [[eigenspace]]s to create templates that detail the matching object under a number of different conditions, such as varying perspectives, illuminations, [[Contrast (vision)|color contrasts]], or object [[pose (computer vision)|poses]].<ref>Luis A. Mateos, Dan Shao and Walter G. Kropatsch. [https://link.springer.com/content/pdf/10.1007/978-3-642-10268-4_104.pdf Expanding Irregular Graph Pyramid for an Approaching Object]. CIARP 2009: 885-891.</ref>  For example, if an algorithm is looking for a face, its template eigenspaces may consist of images (i.e., templates) of faces in different positions to the camera, in different lighting conditions, or with different expressions (i.e., poses).

It is also possible for a matching image to be obscured or occluded by an object. In these cases, it is unreasonable to provide a multitude of templates to cover each possible occlusion. For example, the search object may be a playing card, and in some of the search images, the card is obscured by the fingers of someone holding the card, or by another card on top of it, or by some other object in front of the camera. In cases where the object is malleable or poseable, motion becomes an additional problem, and problems involving both motion and occlusion become ambiguous.<ref>F. Jurie and M. Dhome. [https://www.researchgate.net/profile/Michel_Dhome/publication/221260111_Real_Time_Robust_Template_Matching/links/00b49527217af23045000000/Real-Time-Robust-Template-Matching.pdf Real time robust template matching]. In British Machine Vision Conference, pages 123–131, 2002.</ref> In these cases, one possible solution is to divide the template image into multiple sub-images and perform matching on each subdivision.

==Deformable templates in computational anatomy==

{{Further|Computational anatomy|Group actions in computational anatomy}}

Template matching is a central tool in [[computational anatomy]] (CA). In this field, a [[Computational anatomy#The deformable template orbit model of computational anatomy|deformable template model]] is used to model the space of human anatomies and their [[Orbit (control theory)|orbits]] under the [[Group (mathematics)|group]] of [[Diffeomorphism|diffeomorphisms]], functions which smoothly deform an object.<ref>{{cite journal |last1=Christensen |first1=G.E. |last2=Rabbitt |first2=R.D. |last3=Miller |first3=M.I. |date=October 1996 |title=Deformable template model using large deformation kinematics |journal=IEEE Transactions on Image Processing |volume=5 |issue=10 |pages=1435–1447 |doi=10.1109/83.536892 |pmid=18290061}}</ref> Template matching arises as an approach to finding the unknown diffeomorphism that acts on a template image to match the target image.

Template matching algorithms in CA have come to be called [[large deformation diffeomorphic metric mapping|large deformation diffeomorphic metric mappings]] (LDDMMs). Currently, there are LDDMM template matching algorithms for matching anatomical [[Landmark point|landmark points]], [[Curve|curves]], [[Surface (topology)|surfaces]], volumes.

==Template-based matching explained using cross correlation or sum of absolute differences==

A basic method of template matching sometimes called "Linear Spatial Filtering" uses an image patch (i.e., the "template image" or "filter mask") tailored to a specific [[Feature (computer vision)|feature]] of search images to detect.{{citation needed|date=May 2020}} This technique can be easily performed on grey images or [[Edge detection|edge]] images, where the additional variable of color is either not present or not relevant. [[Cross correlation]] techniques compare the similarities of the search and template images. Their outputs should be highest at places where the image structure matches the template structure, i.e., where large search image values get multiplied by large template image values.

This method is normally implemented by first picking out a part of a search image to use as a template. Let <math>S(x,y)</math> represent the value of a search image pixel, where <math>(x,y)</math> represents the coordinates of the [[pixel]] in the search image. For simplicity, assume pixel values are scalar, as in a [[Grayscale images|greyscale image]]. Similarly, let <math display="inline">T(x_t,y_t)</math> represent the value of a template pixel, where <math display="inline">(x_t,y_t)</math> represents the coordinates of the pixel in the template image. To apply the filter, simply move the center (or origin) of the template image over each point in the search image and calculate the sum of products, similar to a [[dot product]], between the pixel values in the search and template images over the whole area spanned by the template. More formally, if <math>(0,0)</math> is the center (or origin) of the template image, then the cross correlation <math>T\star S</math> at each point <math>(x,y)</math> in the search image can be computed as:<math display="block">(T\star S)(x,y) = \sum_{(x_t,y_t)\in T} T(x_t,y_t) \cdot S(x_t+x, y_t+y)</math>For convenience, <math>T</math> denotes both the pixel values of the template image as well as its [[Domain of a function|domain]], the bounds of the template. Note that all possible positions of the template with respect to the search image are considered. Since cross correlation values are greatest when the values of the search and template pixels align, the best matching position <math>(x_m,y_m)</math> corresponds to the maximum value of <math>T\star S</math> over <math>S</math>.

Another way to handle translation problems on images using template matching is to compare the intensities of the pixels, using the [[sum of absolute differences]] (SAD) measure. To formulate this, let <math>I_S(x_s,y_s)</math> and <math>I_T(x_t,y_t)</math> denote the [[Luminous intensity|light intensity]] of pixels in the search and template images with coordinates <math>(x_s,y_s)</math> and <math>(x_t,y_t)</math>, respectively. Then by moving the center (or origin) of the template to a point <math>(x,y)</math> in the search image, as before, the sum of [[absolute difference|absolute differences]] between the template and search pixel intensities at that point is:'''<math display="block"> SAD(x, y) = \sum_{(x_t,y_t)\in T} \left\vert I_T(x_t,y_t) - I_S(x_t+x,y_t+y) \right\vert </math>'''With this measure, the ''lowest'' SAD gives the best position for the template, rather than the greatest as with cross correlation. SAD tends to be relatively simple to implement and understand, but it also tends to be relatively slow to execute. A simple [[C++]] implementation of SAD template matching is given below.

== Implementation ==
In this simple implementation, it is assumed that the above described method is applied on grey images: This is why '''Grey''' is used as pixel intensity.  The final position in this implementation gives the top left location for where the template image best matches the search image.

<syntaxhighlight lang="c">
minSAD = VALUE_MAX;

// loop through the search image
for ( size_t x = 0; x <= S_cols - T_cols; x++ ) {
    for ( size_t y = 0; y <= S_rows - T_rows; y++ ) {
        SAD = 0.0;

        // loop through the template image
        for ( size_t j = 0; j < T_cols; j++ )
            for ( size_t i = 0; i < T_rows; i++ ) {

                pixel p_SearchIMG = S[y+i][x+j];
                pixel p_TemplateIMG = T[i][j];
		
                SAD += abs( p_SearchIMG.Grey - p_TemplateIMG.Grey );
            }

        // save the best found position 
        if ( minSAD > SAD ) { 
            minSAD = SAD;
            // give me min SAD
            position.bestRow = y;
            position.bestCol = x;
            position.bestSAD = SAD;
        }
    }
    
}</syntaxhighlight>

One way to perform template matching on color images is to decompose the [[pixel]]s into their color components and measure the quality of match between the color template and search image using the sum of the SAD computed for each color separately.

== Speeding up the process ==

In the past, this type of spatial filtering was normally only used in dedicated hardware solutions because of the computational complexity of the operation,<ref>Gonzalez, R, Woods, R, Eddins, S "[https://web.archive.org/web/20190723164704/https://pdfs.semanticscholar.org/ff38/489bbc62d765c8621b32e121ec2814e4fb1f.pdf Digital Image Processing using Matlab]" Prentice Hall, 2004</ref> however we can lessen this complexity by filtering it in the frequency domain of the image, referred to as 'frequency domain filtering,' this is done through the use of the [[convolution theorem]].

Another way of speeding up the matching process is through the use of an image pyramid. This is a series of images, at different scales, which are formed by repeatedly filtering and subsampling the original image in order to generate a sequence of reduced resolution images.<ref>E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt and J. M. Ogden, Pyramid methods in image processing http://web.mit.edu/persci/people/adelson/pub_pdfs/RCA84.pdf</ref> These lower resolution images can then be searched for the template (with a similarly reduced resolution), in order to yield possible start positions for searching at the larger scales. The larger images can then be searched in a small window around the start position to find the best template location.

Other methods can handle problems such as translation, scale, image rotation and even all affine transformations.<ref>Yuan, Po, M.S.E.E. "Translation, scale, rotation and threshold invariant pattern recognition system". The University of Texas at Dallas, 1993, 62 pages; AAT EP13780</ref><ref>H. Y. Kim and S. A. Araújo, "[https://www.researchgate.net/profile/Hae_Kim2/publication/221411551_Grayscale_Template-Matching_Invariant_to_Rotation_Scale_Translation_Brightness_and_Contrast/links/00463516819b8ca844000000/Grayscale-Template-Matching-Invariant-to-Rotation-Scale-Translation-Brightness-and-Contrast.pdf Grayscale Template-Matching Invariant to Rotation, Scale, Translation, Brightness and Contrast]," IEEE Pacific-Rim Symposium on Image and Video Technology, Lecture Notes in Computer Science, vol. 4872, pp. 100-113, 2007.</ref><ref>Korman S., Reichman D., Tsur G. and Avidan S., "[http://openaccess.thecvf.com/content_cvpr_2013/papers/Korman_FasT-Match_Fast_Affine_2013_CVPR_paper.pdf FAsT-Match: Fast Affine Template Matching]", CVPR2013.</ref>

==Improving the accuracy of the matching==

Improvements can be made to the matching method by using more than one template (eigenspaces), these other templates can have different scales and rotations.

It is also possible to improve the accuracy of the matching method by hybridizing the feature-based and template-based approaches.<ref>C. T. Yuen, M. Rizon, W. S. San, and T. C. Seong. “[http://103.86.130.60/xmlui/handle/123456789/15090 Facial Features for Template Matching Based Face Recognition].” American Journal of Engineering and Applied Sciences 3 (1): 899-903, 2010.</ref>  Naturally, this requires that the search and template images have features that are apparent enough to support feature matching.

==Similar methods==

Other methods which are similar include '[[Stereo matching]]', '[[Image registration]]' and '[[Scale-invariant feature transform]]'.

==Examples of use==

Template matching has various applications and is used in such fields as face recognition (see [[facial recognition system]]) and medical image processing. Systems have been developed and used in the past to count the number of faces that walk across part of a bridge within a certain amount of time. Other systems include automated calcified nodule detection within digital chest X-rays.<ref>Ashley Aberneithy. "Automatic Detection of Calcified Nodules of Patients with Tuberculous". University College London, 2007</ref>
Recently, this method was implemented in geostatistical simulation which could provide a fast algorithm.<ref>Tahmasebi, P., Hezarkhani, A., Sahimi, M., 2012, [https://doi.org/10.1007%2Fs10596-012-9287-1 Multiple-point geostatistical modeling based on the cross-correlation functions], Computational Geosciences, 16(3):779-79742.</ref>

==See also==
* [[Facial recognition system]]
* [[Pattern recognition]]
* [[Computer vision]]
* [[Elastic Matching]]

==References==
{{Reflist}}

==External links==
*[http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html Template Matching in OpenCV]
*[http://www.araa.asn.au/acra/acra2004/papers/cole.pdf Visual Object Recognition using Template Matching]
*[http://www.lps.usp.br/~hae/software/cirateg/index.html Rotation, scale, translation-invariant template matching demonstration program]
*[http://campar.in.tum.de/Main/AndreasHofhauser perspective-invariant template matching]
*[https://brunelli.modena.ovh/roberto_brunelli/TM/html/tmCodeCompanionli20.html An extensive template matching bibliography up to 2009]

{{Authority control}}

{{DEFAULTSORT:Template Matching}}
[[Category:Image processing]]