Editing Neural network (machine learning) (section)

===Generalization and statistics===
{{No footnotes|date=August 2019|section}}
Applications whose goal is to create a system that generalizes well to unseen examples, face the possibility of [[Overfitting|over-training]]. This arises in convoluted or over-specified systems when the network capacity significantly exceeds the needed free parameters. 

Two approaches address over-training. The first is to use [[cross-validation (statistics)|cross-validation]] and similar techniques to check for the presence of over-training and to select hyperparameters to minimize the generalization error. The second is to use some form of ''[[regularization (mathematics)|regularization]]''. This concept emerges in a probabilistic (Bayesian) framework, where regularization can be performed by selecting a larger prior probability over simpler models; but also in statistical learning theory, where the goal is to minimize over two quantities: the 'empirical risk' and the 'structural risk', which roughly corresponds to the error over the training set and the predicted error in unseen data due to overfitting.

[[File:Synapse deployment.jpg|thumb|right|upright=1.15|Confidence analysis of a neural network]]
Supervised neural networks that use a [[mean squared error]] (MSE) cost function can use formal statistical methods to determine the confidence of the trained model. The MSE on a validation set can be used as an estimate for variance. This value can then be used to calculate the [[confidence interval]] of network output, assuming a [[normal distribution]]. A confidence analysis made this way is statistically valid as long as the output [[probability distribution]] stays the same and the network is not modified.

By assigning a [[softmax activation function]], a generalization of the [[logistic function]], on the output layer of the neural network (or a softmax component in a component-based network) for categorical target variables, the outputs can be interpreted as posterior probabilities. This is useful in classification as it gives a certainty measure on classifications.

The softmax activation function is:

:<math>y_i=\frac{e^{x_i}}{\sum_{j=1}^c e^{x_j}}</math>
<section end="theory" />