Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Distance matrix
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Data Mining and Machine Learning == === Data Mining === A common function in data mining is applying [[cluster analysis]] on a given set of data to group data based on how similar or more similar they are when compared to other groups. Distance matrices became heavily dependent and utilized in [[cluster analysis]] since similarity can be measured with a distance metric. Thus, distance matrix became the representation of the similarity measure between all the different pairs of data in the set. ==== Hierarchical clustering ==== A distance matrix is necessary for traditional [[hierarchical clustering]] algorithms which are often heuristic methods employed in biological sciences such as phylogeny reconstruction. When implementing any of the hierarchical clustering algorithms in data mining, the distance matrix will contain all pair-wise distances between every point and then will begin to create clusters between two different points or clusters based entirely on distances from the distance matrix. If N be the number of points, the complexity of hierarchical clustering is: * Time complexity is <math>O(N^3)</math> due to the repetitive calculations done after every cluster to update the distance matrix * Space complexity is <math>O(N^2)</math> === Machine Learning === Distance metrics are a key part of several machine learning algorithms, which are used in both [[Supervised learning|supervised]] and [[unsupervised learning]]. They are generally used to calculate the similarity between data points: this is where the distance matrix is an essential element. The use of an effective distance matrix improves the performance of the machine learning model, whether it is for classification tasks or for clustering.<ref>{{Cite web |date=February 25, 2020 |title=4 types of distance metrics in machine learning |url=https://www.analyticsvidhya.com/blog/2020/02/4-types-of-distance-metrics-in-machine-learning/ }}</ref> ==== K-Nearest Neighbors ==== A distance matrix is utilized in the [[k-NN algorithm]] which is one of the slowest but simplest and most used instance-based machine learning algorithms that can be used both in classification and regression tasks. It is one of the slowest machine learning algorithms since each test sample's predicted result requires a fully computed distance matrix between the test sample and each training sample in the training set. Once the distance matrix is computed, the algorithm selects the K number of training samples that are the closest to the test sample to predict the test sample's result based on the selected set's majority (classification) or average (regression) value. * Prediction time complexity is <math>O(k * n * d)</math>, to compute the distance between each test sample with every training sample to construct the distance matrix where: # k = number of nearest neighbors selected # n = size of the training set # d = number of dimensions being used for the data This classification focused model predicts the label of the target based on the distance matrix between the target and each of the training samples to determine the K-number of samples that are the closest/nearest to the target. {{Photo montage| | photo1a =DistanceMatrix_KNN.png{{!}}The distance matrix used to select K train samples for K-nn | photo1b =K_nearestNeighborVisual.png{{!}}Machine Learning model predicting target value with K-NN | size = 650 | border = 0 | color = transparent }} === Computer Vision === A distance matrix can be used in [[Neural network|neural networks]] for 2D to 3D regression in image predicting machine learning models.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)