Editing Image segmentation (section)

== Trainable segmentation ==
Most of the aforementioned segmentation methods are based only on color information of pixels in the image. Humans use much more knowledge when performing image segmentation, but implementing this knowledge would cost considerable human engineering and computational time, and would require a huge [[domain knowledge]] database which does not currently exist. Trainable segmentation methods, such as [[neural network]] segmentation, overcome these issues by modeling the domain knowledge from a dataset of labeled pixels.

An image segmentation [[neural network]] can process small areas of an image to extract simple features such as edges.<ref name="Transactions on Engineering, Computing and Technology">[[Mahinda Pathegama]] & Ö Göl (2004): "Edge-end pixel extraction for edge-based image segmentation", ''Transactions on Engineering, Computing and Technology,'' vol. 2, pp 213–216, ISSN 1305-5313</ref> Another neural network, or any decision-making mechanism, can then combine these features to label the areas of an image accordingly. A type of network designed this way is the [[Kohonen map]].

[[Pulse-coupled networks|Pulse-coupled neural networks (PCNNs)]] are neural models proposed by modeling a cat's visual cortex and developed for high-performance [[biomimetic]] [[image processing]]. In 1989, Reinhard Eckhorn introduced a neural model to emulate the mechanism of a cat's visual cortex. The Eckhorn model provided a simple and effective tool for studying the visual cortex of small mammals, and was soon recognized as having significant application potential in image processing. In 1994, the Eckhorn model was adapted to be an image processing algorithm by John L. Johnson, who termed this algorithm Pulse-Coupled Neural Network.<ref>{{cite journal|last1=Johnson|first1=John L.|date=September 1994|title=Pulse-coupled neural nets: translation, rotation, scale, distortion, and intensity signal invariance for images|doi=10.1364/AO.33.006239|pmid=20936043|publisher=OSA|volume=33|journal=Applied Optics|number=26|pages=6239–6253|bibcode=1994ApOpt..33.6239J}}</ref> Over the past decade, PCNNs have been utilized for a variety of image processing applications, including: image segmentation, feature generation, face extraction, motion detection, region growing, noise reduction, and so on. A PCNN is a two-dimensional neural network. Each neuron in the network corresponds to one pixel in an input image, receiving its corresponding pixel's color information (e.g. intensity) as an external stimulus. Each neuron also connects with its neighboring neurons, receiving local stimuli from them. The external and local stimuli are combined in an internal activation system, which accumulates the stimuli until it exceeds a dynamic threshold, resulting in a pulse output. Through iterative computation, PCNN neurons produce temporal series of pulse outputs. The temporal series of pulse outputs contain information of input images and can be utilized for various image processing applications, such as image segmentation and feature generation. Compared with conventional image processing means, PCNNs have several significant merits, including robustness against noise, independence of geometric variations in input patterns, capability of bridging minor intensity variations in input patterns, etc.

In 2015, [[convolutional neural network|convolutional neural networks]] reached state of the art in semantic segmentation.<ref>{{Cite conference|last=Long |first=Jonathan |last2=Shelhamer |first2=Evan |last3=Darrell |first3=Trevor |date=2015 |title=Fully Convolutional Networks for Semantic Segmentation |url=https://openaccess.thecvf.com/content_cvpr_2015/html/Long_Fully_Convolutional_Networks_2015_CVPR_paper.html |conference=Proceedings of the IEEE conference on computer vision and pattern recognition|pages=3431–3440}}</ref> [[U-Net]] is an architecture which takes as input an image and outputs a label for each pixel.<ref>{{cite arXiv|last1=Ronneberger|first1=Olaf|last2=Fischer|first2=Philipp|last3=Brox|first3=Thomas|title=U-Net: Convolutional Networks for Biomedical Image Segmentation|eprint=1505.04597|date=2015|class=cs.CV}}</ref> U-Net initially was developed to detect cell boundaries in biomedical images. U-Net follows classical [[autoencoder]] architecture, as such it contains two sub-structures. The encoder structure follows the traditional stack of convolutional and max pooling layers to increase the receptive field as it goes through the layers. It is used to capture the context in the image. The decoder structure utilizes transposed convolution layers for upsampling so that the end dimensions are close to that of the input image. Skip connections are placed between convolution and transposed convolution layers of the same shape in order to preserve details that would have been lost otherwise.

In addition to pixel-level semantic segmentation tasks which assign a given category to each pixel, modern segmentation applications include instance-level semantic segmentation tasks in which each individual in a given category must be uniquely identified, as well as panoptic segmentation tasks which combines these two tasks to provide a more complete scene segmentation.<ref name="Panoptic Segmentation"/>