Editing Image segmentation (section)

== Graph partitioning methods ==
[[Graph (data structure)|Graph]] partitioning methods are an effective tools for image segmentation since they model the impact of pixel neighborhoods on a given cluster of pixels or pixel, under the assumption of homogeneity in images. In these methods, the image is modeled as a weighted, [[undirected graph]]. Usually a pixel or a group of pixels are associated with [[Vertex (graph theory)|nodes]] and [[Glossary of graph theory#Basics|edge]] weights define the (dis)similarity between the neighborhood pixels. The graph (image) is then partitioned according to a criterion designed to model "good" clusters. Each partition of the nodes (pixels) output from these algorithms are considered an object segment in the image; see [[Segmentation-based object categorization]]. Some popular algorithms of this category are normalized cuts,<ref>Jianbo Shi and [[Jitendra Malik]] (2000): [https://www.cs.cmu.edu/~jshi/papers/pami_ncut.pdf "Normalized Cuts and Image Segmentation"], ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', pp 888–905, Vol. 22, No. 8</ref> [[random walker (computer vision)|random walker]],<ref>Leo Grady (2006): [http://vision.cse.psu.edu/people/chenpingY/paper/grady2006random.pdf "Random Walks for Image Segmentation"], ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', pp. 1768–1783, Vol. 28, No. 11</ref> minimum cut,<ref>Z. Wu and R. Leahy (1993): [ftp://sipi.usc.edu/pub/leahy/pdfs/MAP93.pdf "An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation"]{{dead link|date=May 2025|bot=medic}}{{cbignore|bot=medic}}, ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', pp. 1101–1113, Vol. 15, No. 11</ref> isoperimetric partitioning,<ref>Leo Grady and Eric L. Schwartz (2006): [http://www.cns.bu.edu/~lgrady/grady2006isoperimetric.pdf "Isoperimetric Graph Partitioning for Image Segmentation"] {{Webarchive|url=https://web.archive.org/web/20110719090404/http://www.cns.bu.edu/~lgrady/grady2006isoperimetric.pdf |date=19 July 2011 }}, ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', pp. 469–475, Vol. 28, No. 3</ref> [[minimum spanning tree-based segmentation]],<ref>C. T. Zahn (1971): [http://web.cse.msu.edu/~cse802/Papers/zahn.pdf "Graph-theoretical methods for detecting and describing gestalt clusters"], ''IEEE Transactions on Computers'', pp. 68–86, Vol. 20, No. 1</ref> and [[segmentation-based object categorization]].

=== Markov random fields ===

The application of [[Markov random field]]s (MRF) for images was suggested in early 1984 by Geman and Geman.<ref>S. Geman and D. Geman (1984): "Stochastic relaxation, Gibbs Distributions and Bayesian Restoration of Images", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 721–741, Vol. 6, No. 6.</ref> Their strong mathematical foundation and ability to provide a global optimum even when defined on local features proved to be the foundation for novel research in the domain of image analysis, de-noising and segmentation. MRFs are completely characterized by their prior probability distributions, marginal probability distributions, [[Clique (graph theory)|cliques]], smoothing constraint as well as criterion for updating values. The criterion for image segmentation using MRFs is restated as finding the labelling scheme which has maximum probability for a given set of features. The broad categories of image segmentation using MRFs are supervised and unsupervised segmentation.

==== Supervised image segmentation using MRF and MAP ====
In terms of image segmentation, the function that MRFs seek to maximize is the probability of identifying a labelling scheme given a particular set of features are detected in the image. This is a restatement of the [[maximum a posteriori estimation]] method.

[[File:MRF neighborhood.png|thumb|right|MRF neighborhood for a chosen pixel]]

The generic algorithm for image segmentation using MAP is given below:
{{ordered list
| Define the neighborhood of each feature (random variable in MRF terms).
<br/>Generally this includes 1st-order or 2nd-order neighbors.
| Set initial probabilities {{math|''P''(''f<sub>i</sub>'')}}> for each feature as 0 or| where {{math|''f<sub>i</sub>'' &isin; &Sigma;}} is the set containing features extracted
<br/>for pixel {{mvar|i}} and define an initial set of clusters.
| Using the training data compute the mean ({{mvar|''&mu;''<sub>''ℓ''<sub>''i''</sub></sub>}}) and variance ({{math|&sigma;<sub>''ℓ''<sub>''i''</sub></sub>}}) for each label. This is termed as class statistics.
| Compute the marginal distribution for the given labeling scheme {{math|''P''(''f<sub>i</sub>''&nbsp;{{!}}&nbsp;''ℓ''<sub>''i''</sub>)}} using [[Bayes' theorem]]
and the class statistics calculated earlier. A Gaussian model is used for the marginal distribution.
:<math>
\frac 1 {\sigma(\ell_i) \sqrt{2\pi} } e^{ -(f_i-\mu(\ell_i))^2/(2\sigma(\ell_i)^2) } \, d\ell_i
</math>
| Calculate the probability of each class label given the neighborhood defined previously.
<br/>[[Clique (graph theory)|Clique]] potentials are used to model the social impact in labeling.
| Iterate over new prior probabilities and redefine clusters such that these probabilities are maximized.
<br/>This is done using a variety of optimization algorithms described below.
| Stop when probability is maximized and labeling scheme does not change.
<br/>The calculations can be implemented in [[Log-likelihood|log likelihood]] terms as well.
}}

==== Optimization algorithms ====
Each optimization algorithm is an adaptation of models from a variety of fields and they are set apart by their unique cost functions. The common trait of cost functions is to penalize change in pixel value as well as difference in pixel label when compared to labels of neighboring pixels.

===== Iterated conditional modes/gradient descent =====

The [[iterated conditional modes]] (ICM) algorithm tries to reconstruct the ideal labeling scheme by changing the values of each pixel over each iteration and evaluating the energy of the new labeling scheme using the cost function given below,

:<math> \alpha(1-\delta(\ell_i-\ell_{\text{initial }i})+ \beta \Sigma_{q \in N(i)}(1 - \delta(\ell_i,\ell_{q(i)})). </math>

where {{mvar|&alpha;}} is the penalty for change in pixel label and {{mvar|&beta;}} is the penalty for difference in label between
neighboring pixels and chosen pixel. Here <math>N(i)</math> is neighborhood of pixel i and {{mvar|&delta;}} is the Kronecker delta function. A major issue with ICM is that, similar to gradient descent, it has a tendency to rest over local maxima and thus not obtain a globally optimal labeling scheme.

===== Simulated annealing (SA) =====

Derived as an analogue of annealing in metallurgy, [[simulated annealing]] (SA) uses change in pixel label over iterations and estimates the difference in energy of each newly formed graph to the initial data. If the newly formed graph is more profitable, in terms of low energy cost, given by:

: <math>\Delta U = U^\text{new} - U^\text{old}</math>

:<math>
\ell_i =
\begin{cases}
\ell^\text{new}_i, & \text{if } \Delta U \leq 0 ,\\
\ell^\text{new}_i, & \text{if } \Delta U > 0 \text{ and } \delta < e^{-\Delta U / T},
\ell^\text{old}_i
\end{cases}
 </math>

the algorithm selects the newly formed graph. Simulated annealing requires the input of temperature schedules which directly affects the speed of convergence of the system, as well as energy threshold for minimization to occur.

===== Alternative algorithms =====

A range of other methods exist for solving simple as well as higher order MRFs. They include Maximization of Posterior Marginal, Multi-scale MAP estimation,<ref>A. Bouman and M. Shapiro (2002): "A multiscale Random field model for Bayesian image segmentation", IEEE Transactions on Image Processing, pp. 162–177, Vol. 3.</ref> Multiple Resolution segmentation<ref>J. Liu and Y. H. Yang (1994): "[https://ieeexplore.ieee.org/abstract/document/297949/ Multiresolution color image segmentation]", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 689–700, Vol. 16.</ref> and more. Apart from likelihood estimates, graph-cut using maximum flow<ref>S. Vicente, V. Kolmogorov and C. Rother (2008): "[http://www0.cs.ucl.ac.uk/staff/s.vicente/papers/connectedGC-CVPR08-TR.pdf Graph cut based image segmentation with connectivity priors]", CVPR</ref> and other highly constrained graph based methods<ref>Corso, Z. Tu, and A. Yuille (2008): "MRF Labelling with Graph-Shifts Algorithm", Proceedings of International workshop on combinatorial Image Analysis</ref><ref>B. J. Frey and D. MacKayan (1997): "[http://papers.nips.cc/paper/1467-a-revolution-belief-propagation-in-graphs-with-cycles.pdf A Revolution: Belief propagation in Graphs with Cycles]", Proceedings of Neural Information Processing Systems (NIPS)</ref> exist for solving MRFs.

==== Image segmentation using MAP and expectation–maximization ====

The [[expectation–maximization algorithm]] is utilized to iteratively estimate the a posterior probabilities and distributions of labeling when no training data is available and no estimate of segmentation model can be formed. A general approach is to use histograms to represent the features of an image and proceed as outlined briefly in this three-step algorithm:

1. A random estimate of the model parameters is utilized.

2. E step: Estimate class statistics based on the random segmentation model defined. Using these, compute the [[conditional probability]] of belonging to a label given the feature set is calculated using naive [[Bayes' theorem]].

:<math>
P(\lambda \mid f_i) = \frac{P(f_i \mid \lambda) P(\lambda)}{\Sigma_{\lambda \in \Lambda} P(f_i \mid \lambda) P(\lambda)}
</math>

Here <math>\lambda \in \Lambda</math>, the set of all possible labels.

3. M step: The established relevance of a given feature set to a labeling scheme is now used to compute the a priori estimate of a given label in the second part of the algorithm. Since the actual number of total labels is unknown (from a training data set), a hidden estimate of the number of labels given by the user is utilized in computations.

:<math>
P(\lambda) = \frac{\Sigma_{\lambda \in \Lambda} P(\lambda \mid f_i)}{|\Omega|}
</math>
where <math>\Omega</math> is the set of all possible features.

[[File:Sample segmentation HMRF-EM.png|thumb|center|Segmentation of color image using HMRF-EM model]]

==== Disadvantages of MAP and EM based image segmentation ====
# Exact MAP estimates cannot be easily computed.
# Approximate MAP estimates are computationally expensive to calculate.
# Extension to multi-class labeling degrades performance and increases storage required.
# Reliable estimation of parameters for EM is required for global optima to be achieved.
# Based on method of optimization, segmentation may cluster to local minima.